Avoid burden to copy .ncml files to server under test

tlvu commented 3 years ago

Small code change, fairly big impact here. The README has the background.

Impact:

(removed, will re-evaluate this next time) With the switch to --nbval-lax that disable output checking, some notebook failures might fly under the radar. We have to audit all notebooks and add the # NBVAL_CHECK_OUTPUT to any cells that is critical to the correctness of the notebooks.
With .ncml links always coming from our production pavics.ouranos.ca server, this means if the production server is down, tests will fail as well. This means the tests are less "independent".

tlvu commented 3 years ago

@cjauvin FYI

matprov commented 3 years ago

@tlvu I wonder if adding # NBVAL_CHECK_OUTPUT to each and every cells we want to check is what we really want. I mean, wouldn't we instead test all cells by default and rely on # NBVAL_SKIP when needed, like it's actually done?

dbyrns commented 3 years ago

I wonder how many cells need to be validated or not. We should put a flag in the least represented category. When I asked @huard about this change during our meeting I was under the impression that the vast majority of cells should now be skipped but if this is not the case I'm not so sure that's the way to go.

Could we split the tests notebooks in two folders each of them having its own validation rules?

matprov commented 3 years ago

Could we split the tests notebooks in two folders each of them having its own validation rules?

@dbyrns We have to consider that these notebooks are at first tutorial notebooks, and organizing notebooks to fit our test suites instead of fitting the test suite to the notebooks might be misleading.

tlvu commented 3 years ago

@tlvu I wonder if adding # NBVAL_CHECK_OUTPUT to each and every cells we want to check is what we really want. I mean, wouldn't we instead test all cells by default and rely on # NBVAL_SKIP when needed, like it's actually done?

Thinking about this over the night. Maybe disable all output check by default is probably too extreme.

I would assume there will be at least one cell with # NBVAL_CHECK_OUTPUT in every notebook. I feel that output checking is the equivalent of the assert in regular unit tests so it would be weird and a bit scary that we do not assert anything in the notebooks.

So either have # NBVAL_CHECK_OUTPUT with --nbval-lax or # NBVAL_IGNORE_OUTPUT with regular --nbval is the same: we can never hide that we leverage the notebooks as test cases. Might as well be on the safe side and stick with regular --nbval like currently.

Most of the current cells do not need # NBVAL_IGNORE_OUTPUT. We will need it for cells that output xarray dataset in html. So probably 1 or 2 cells per notebooks, assuming notebooks are not gigantic.

For other ordering problems in the output, we can also use # NBVAL_IGNORE_OUTPUT instead of doing hacks like in https://github.com/Ouranosinc/pavics-sdi/commit/d19ad355b522b6f5f012308b4fe94a079ca76c37 which makes the code harder to read from the tutorial perspective.

What do you think @huard @tlogan2000 @cjauvin

dbyrns commented 3 years ago

Most of the current cells do not need # NBVAL_IGNORE_OUTPUT.

It answers my question about how many cells need to be validated or not. And given the answer, the obvious choice would be to add a flag for those specifics cells and keep the regular --nbval

huard commented 3 years ago

Well, that's the current situation, and I find that we're spending a lot of time tweaking the notebooks. The starting point for this discussion is that we get a lot of tests failing for irrelevant details, which is distracting us from more serious issues.

dbyrns commented 3 years ago

Well, that's the current situation, and I find that we're spending a lot of time tweaking the notebooks. The starting point for this discussion is that we get a lot of tests failing for irrelevant details, which is distracting us from more serious issues.

By tweaking, you mean adding the # NBVAL_IGNORE_OUTPUT where the result is not consistent or trying to make the cell output consistent ? I agree that the latter is pointless, but the first looks to me less painful than adding # NBVAL_CHECK_OUTPUT everywhere we want to check a consistent cell, if they are more of them.

tlvu commented 3 years ago

I feel we are not ready to enable the --nbval-lax pour l'instant. There is no plan to audit all the other notebooks to add # NBVAL_CHECK_OUTPUT yet. I do not want to decrease the confidence level of our test suite.

I also need the .ncml on pavics.ouranos.ca right now to test Raven notebooks so let's reviset the --nbval-lax another time.

huard commented 3 years ago

Ok, let's sync the lax change with an update of notebooks to add #NBVAL_CHECK_OUTPUT tags to cells.

tlvu commented 3 years ago

Ok, let's sync the lax change with an update of notebooks to add #NBVAL_CHECK_OUTPUT tags to cells.

This can be done ahead of time, before the --nbval-lax change, since adding # NBVAL_CHECK_OUTPUT in non lax mode has no effect. Basically, don't have to wait to sync with lax change.

Also we have to agree how much we want our tutorial/documentation notebook to reflects tightly the deployed reality.

Ex: https://github.com/Ouranosinc/pavics-sdi/pull/198 and https://github.com/bird-house/finch/pull/149 it's nice to know loudly birdy 0.7.0 has a new output_formats option in the client and Numpy is throwing new deprecation warnings. Is it worth the time to perform notebook output update.

matprov commented 3 years ago

@huard @tlvu I'm still not sure the --nbval-lax option offers us the desired behaviour. After having audited the existing notebook cells to add # NBVAL_CHECK_OUTPUT, we would have to specify this for every single cell we a in a notebook later on. We will, for sure, forget to do this, resulting in silent fails..

Keeping the existing system requires us only to apply a one-time # NBVAL_SKIP annotation in specific cell. Once done, we don't have to think furthermore about tests since we know everything is tested by default.

huard commented 3 years ago

But, again, this is the current situation, and I believe it's wasting our time on nitpicking. It makes the threshold for failure so low that there is almost always one test failing somewhere, which defeats the purpose of those tests.

I'm open to discussion, but please let's agree that the status quo is unsustainable.

dbyrns commented 3 years ago

Let's talk about that in our next meeting. It would be helpfull to know what cause the unsustainable situation right now :

Does the current situation which is unsustainable apply to adding '# NBVAL_SKIP' in cells or trying to fix their result?
Does some of the notebooks are hard to maintain or is it widespread?
Are we talking of a bunch of problematic cells or is there too much of them so that adding '# NBVAL_SKIP' is a burden.

Could we add a flag at a notebook level at least so that we have a middle solution between an e2e global flag and a cell flag?

Ouranosinc / PAVICS-e2e-workflow-tests

Avoid burden to copy .ncml files to server under test #59