Better CI - notebooks, and component coverage #53

Closed podhrmic closed 3 years ago

In GitLab by @podhrmic on Sep 18, 2020, 16:37

It is non-trivial to run the notebooks in the CI. The current nbconvert->PDF approach works OK, but has limits when extensive data visualization is involved (any time you call IPython). We need to find out a better strategy for consistently testing the notebooks, including the visualization.

Another issue is making sure we have sufficient code coverage - how do we know we test all the configs/components?

@bauer-matthews do you have any suggestions for coverage of various configurations? (besides manually writing the tests)

In GitLab by @podhrmic on Oct 19, 2020, 13:18

changed time estimate to 4h

In GitLab by @podhrmic on Oct 19, 2020, 13:50

changed title from Better CI {-for notebooks-} to Better CI {+- notebooks, and component coverage+}

In GitLab by @podhrmic on Oct 19, 2020, 13:50

changed the description

In GitLab by @podhrmic on Oct 19, 2020, 17:18

mentioned in merge request !55

In GitLab by @bauer-matthews on Oct 20, 2020, 06:12

I suggest we run/test the notebooks and simulations we have and measure code coverage. Maybe we can use this: https://pypi.org/project/coverage/? We can augment where needed, but I suspect that is going to cover most (all) of the code.

In GitLab by @podhrmic on Oct 20, 2020, 16:38

Good idea. I see currently two issues with the coverage metric: 1) our examples are run through a bash script ./run-csaf.sh, and coverage needs a python file 2) the notebooks need to be run with ipython, which coverage run myprogram doesn't understand

In GitLab by @bauer-matthews on Oct 21, 2020, 05:52

./run-csaf is just a wrapper around the python file run_system.py. For testing simulations, we can simply maintain a separate list of run_system.py commands to run. And we could probably auto-generate this list by having calls to ./run-csaf log the arguments to the run_system.py calls that are made.

For notebooks, we could use nbconvert to generate python files to run.

This does create the problem of essentially having to run two copies of every test. But we could get around this by running the sricpt/notebook tests in CI. And having a separate cron-based process to generate test coverage metrics.

Just throwing ideas out....

In GitLab by @podhrmic on Oct 21, 2020, 10:26

@bauer-matthews thanks for the ideas! Have a look at this:https://gitlab-ext.galois.com/assuredautonomy/csaf_architecture/-/blob/feature/better-ci/test_notebooks.sh

Perhaps we could modify it so it measures coverage as well? Still not sure how to deal with the iphython requirement - the nbcovert generated python files have lot of ipython calls, so they don't run with regular python.

In GitLab by @bauer-matthews on Oct 21, 2020, 10:43

@podhrmic Can you give me a sense for what ipython calls remain and what they are doing?

In GitLab by @podhrmic on Oct 21, 2020, 11:58

Here: f16-shield.py

It is mostly the system calls - such as !cat somefile, and a weird string format that triggers a syntax error with regular python, for example: f"pub-sub-plot.png"

In GitLab by @bauer-matthews on Oct 21, 2020, 12:38

Ok. I was reading that cell magic is a likely culprit, so that makes sense. Those things aren't really going to contribute to code coverage anyway. Is there any way we can just filter them out? We could probably just do a sed on lines that start with get_ipython().

In GitLab by @podhrmic on Oct 21, 2020, 13:28

We could, and that would be an easy solution. But as you said - we will have to run the tests twice, once with ipython to get an authentic test (for example, what if some of the cell magic is broken? We want to capture that in the CI), and once to get the coverage (with the filtered python file).

In GitLab by @bauer-matthews on Oct 22, 2020, 06:31

I suggest we run the authentic ipython and ./run-csaf script based tests in CI.

We can create a separate test set that runs the sanitized python notebooks and the various instantiations of run_system.py in a separate non-CI process to get test coverage metrics. For example, we could do this on release. Or Weekly. Or with develop to master merges.

Thoughts?

In GitLab by @podhrmic on Oct 22, 2020, 10:47

Sounds good to me.

In GitLab by @podhrmic on Oct 27, 2020, 14:22

mentioned in commit 66a1cb6aa068093cae68d91dbba883b41dd471c3

GaloisInc / csaf

Better CI - notebooks, and component coverage #53