NDF-Poli-USP / spyro

Wave propagators for seismic domains with application to full waveform inversion.
GNU General Public License v3.0
34 stars 15 forks source link

Fixing codecov tracking wrong workflow #80

Closed Olender closed 2 years ago

Olender commented 2 years ago

This started because the codecov reports weren't covering both the adjoint and the parallel tests, do to them being linked to the previous circleCI workflow (which only tested in serial). Since we have set up a runner (so we can actually do adjoint and parallel tests), I wanted to update it to track the correct workflow. Ended up adding a 3D parallel test in there as well.

Both the 2D and 3D tests now look at wave propagation in heterogeneous velocity cases (marmousi and overthurst). They are simple tests that look at receiver pairs and measures if the velocity of the wave's propagation between them matches water velocity. The 2D case does this with ensemble parallelism looking at waves propagated from many sources, testing each one in order to check if the ensemble parallelism is actually working.

Unfortunately, even when tests pass the coverage package gets stuck when trying to merge all the coverage files in parallel. They seem to merge fine, but produce an internal error that fails the build. This happens only on the large parallel tests (3D and gridpoint calculator). So I have separated those tests (sent the gridpoint calculator test to serial and now run the parallel 3D test with only pytest, without coverage), and also included a second run of the 3D test with coverage that ignores errors. This way the build breaks if something fails the 3D propagation and we also get coverage. However, this also significantly increases workflow runtime, so if anyone has free time I ask to have a look at a better fix for generating coverage reports on our large 3D tests.

Olender commented 2 years ago

Most of this was done in order to have good code coverage in main before we add major updates

krober10nd commented 2 years ago

Did you take a look at what I did here?

https://github.com/krober10nd/SeismicMesh/blob/master/tox.ini

could you post the error you receive that you mentioned?

codecov-commenter commented 2 years ago

Codecov Report

Merging #80 (aaf86d2) into main (a02eb9d) will increase coverage by 28.21%. The diff coverage is 75.00%.

@@             Coverage Diff             @@
##             main      #80       +/-   ##
===========================================
+ Coverage   53.75%   81.97%   +28.21%     
===========================================
  Files          29       28        -1     
  Lines        2156     1875      -281     
===========================================
+ Hits         1159     1537      +378     
+ Misses        997      338      -659     
Impacted Files Coverage Δ
spyro/domains/quadrature.py 95.12% <ø> (+46.84%) :arrow_up:
spyro/sources/Sources.py 86.79% <ø> (+29.01%) :arrow_up:
spyro/sources/__init__.py 100.00% <ø> (ø)
spyro/solvers/helpers.py 63.33% <33.33%> (+0.37%) :arrow_up:
spyro/tools/input_models.py 95.93% <37.50%> (+33.50%) :arrow_up:
spyro/io/io.py 75.11% <100.00%> (+22.53%) :arrow_up:
spyro/plots/plots.py 96.55% <100.00%> (+75.86%) :arrow_up:
spyro/tools/gradient_test_ad.py 98.11% <100.00%> (+78.54%) :arrow_up:
spyro/tools/grid_point_calculator.py 57.83% <0.00%> (+1.04%) :arrow_up:
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5a3036d...aaf86d2. Read the comment docs.

Olender commented 2 years ago

could you post the error you receive that you mentioned?

It's an internal error shown in the build workflow file (https://github.com/NDF-Poli-USP/spyro/runs/6062483727?check_suite_focus=true), under the "Covering parallel 3D forward test". It's hard to reproduce outside of the runner without firedrake running on a large case. The "INTERNALERROR> coverage.exceptions.DataError: Couldn't use data file '/home/olender/actions-runner/_work/spyro/spyro/.coverage.hydra03.1679797.346279'" seems related to coverage merging each coverage report from the parallel processes into one. From what I can tell in the runner, each parallel coverage file is generated and most are merged. However some (1+ out of the 15) are sometimes left-out of the coverage report and deleted by coverage before merging, generating that error. This was a known issue for coverage in the past, but was fixed in the version we use. I have been trying different pytest-cov and coverage versions, with some success (although intermittently) locally but no success in hydra3 (the runner). I could only reproduce the error with relatively big firedrake-based cases. If we want to fix the coverage issue, the next steps would be to find the smallest firedrake-based case that generates this error and try to see single out what is causing it, while also opening up an issue in the coverage github. However, running 3D cases to generate this error is time consuming.

I would like to point out that all the pytests work and cover now 80+% of the code. Since the issue is only related to the coverage report generation and the temporary fix (of running pytest regularly without coverage first and then the same test with coverage and passing this error) only has the downside of increasing build time, I think this should get merged and an issue related to coverage be opened for a later fix.

Did you take a look at what I did here?

https://github.com/krober10nd/SeismicMesh/blob/master/tox.ini

I had a look into tox, but I am not sure it would help with this problem, since we are limitled to a single distribution of firedrake and python in the runner (this could be fixed in the future and/or setup to run in containers). Or did you mean the -ignore=pybind11argument? Haven't tried that, what does it do?