Closed tlvu closed 3 years ago
@cjauvin FYI
@tlvu I wonder if adding # NBVAL_CHECK_OUTPUT
to each and every cells we want to check is what we really want. I mean, wouldn't we instead test all cells by default and rely on # NBVAL_SKIP
when needed, like it's actually done?
I wonder how many cells need to be validated or not. We should put a flag in the least represented category. When I asked @huard about this change during our meeting I was under the impression that the vast majority of cells should now be skipped but if this is not the case I'm not so sure that's the way to go.
Could we split the tests notebooks in two folders each of them having its own validation rules?
Could we split the tests notebooks in two folders each of them having its own validation rules?
@dbyrns We have to consider that these notebooks are at first tutorial notebooks, and organizing notebooks to fit our test suites instead of fitting the test suite to the notebooks might be misleading.
@tlvu I wonder if adding
# NBVAL_CHECK_OUTPUT
to each and every cells we want to check is what we really want. I mean, wouldn't we instead test all cells by default and rely on# NBVAL_SKIP
when needed, like it's actually done?
Thinking about this over the night. Maybe disable all output check by default is probably too extreme.
I would assume there will be at least one cell with # NBVAL_CHECK_OUTPUT
in every notebook. I feel that output checking is the equivalent of the assert
in regular unit tests so it would be weird and a bit scary that we do not assert anything in the notebooks.
So either have # NBVAL_CHECK_OUTPUT
with --nbval-lax
or # NBVAL_IGNORE_OUTPUT
with regular --nbval
is the same: we can never hide that we leverage the notebooks as test cases. Might as well be on the safe side and stick with regular --nbval
like currently.
Most of the current cells do not need # NBVAL_IGNORE_OUTPUT
. We will need it for cells that output xarray dataset in html. So probably 1 or 2 cells per notebooks, assuming notebooks are not gigantic.
For other ordering problems in the output, we can also use # NBVAL_IGNORE_OUTPUT
instead of doing hacks like in https://github.com/Ouranosinc/pavics-sdi/commit/d19ad355b522b6f5f012308b4fe94a079ca76c37 which makes the code harder to read from the tutorial perspective.
What do you think @huard @tlogan2000 @cjauvin
Most of the current cells do not need
# NBVAL_IGNORE_OUTPUT
.
It answers my question about how many cells need to be validated or not. And given the answer, the obvious choice would be to add a flag for those specifics cells and keep the regular --nbval
Well, that's the current situation, and I find that we're spending a lot of time tweaking the notebooks. The starting point for this discussion is that we get a lot of tests failing for irrelevant details, which is distracting us from more serious issues.
Well, that's the current situation, and I find that we're spending a lot of time tweaking the notebooks. The starting point for this discussion is that we get a lot of tests failing for irrelevant details, which is distracting us from more serious issues.
By tweaking, you mean adding the # NBVAL_IGNORE_OUTPUT
where the result is not consistent or trying to make the cell output consistent ? I agree that the latter is pointless, but the first looks to me less painful than adding # NBVAL_CHECK_OUTPUT
everywhere we want to check a consistent cell, if they are more of them.
I feel we are not ready to enable the --nbval-lax
pour l'instant. There is no plan to audit all the other notebooks to add # NBVAL_CHECK_OUTPUT
yet. I do not want to decrease the confidence level of our test suite.
I also need the .ncml
on pavics.ouranos.ca
right now to test Raven notebooks so let's reviset the --nbval-lax
another time.
Ok, let's sync the lax change with an update of notebooks to add #NBVAL_CHECK_OUTPUT tags to cells.
Ok, let's sync the lax change with an update of notebooks to add #NBVAL_CHECK_OUTPUT tags to cells.
This can be done ahead of time, before the --nbval-lax
change, since adding # NBVAL_CHECK_OUTPUT
in non lax mode has no effect. Basically, don't have to wait to sync with lax change.
Also we have to agree how much we want our tutorial/documentation notebook to reflects tightly the deployed reality.
Ex: https://github.com/Ouranosinc/pavics-sdi/pull/198 and https://github.com/bird-house/finch/pull/149 it's nice to know loudly birdy 0.7.0
has a new output_formats
option in the client and Numpy is throwing new deprecation warnings. Is it worth the time to perform notebook output update.
@huard @tlvu I'm still not sure the --nbval-lax
option offers us the desired behaviour. After having audited the existing notebook cells to add # NBVAL_CHECK_OUTPUT
, we would have to specify this for every single cell we a in a notebook later on. We will, for sure, forget to do this, resulting in silent fails..
Keeping the existing system requires us only to apply a one-time # NBVAL_SKIP
annotation in specific cell. Once done, we don't have to think furthermore about tests since we know everything is tested by default.
But, again, this is the current situation, and I believe it's wasting our time on nitpicking. It makes the threshold for failure so low that there is almost always one test failing somewhere, which defeats the purpose of those tests.
I'm open to discussion, but please let's agree that the status quo is unsustainable.
Let's talk about that in our next meeting. It would be helpfull to know what cause the unsustainable situation right now :
Could we add a flag at a notebook level at least so that we have a middle solution between an e2e global flag and a cell flag?
Small code change, fairly big impact here. The README has the background.
Impact:
(removed, will re-evaluate this next time) With the switch to
--nbval-lax
that disable output checking, some notebook failures might fly under the radar. We have to audit all notebooks and add the# NBVAL_CHECK_OUTPUT
to any cells that is critical to the correctness of the notebooks.With
.ncml
links always coming from our productionpavics.ouranos.ca
server, this means if the production server is down, tests will fail as well. This means the tests are less "independent".