Dummy test - Githubissues

lsoucasse commented 2 months ago

This work introduces the first integration test for proteus using the dummy model. It checks the full data stored in runtime_helpfile.csvand compares all png plot output.

The test takes 11 min to run on the CI (instead of 3 min locally).
The coverage goes from 16% to 40%.
Results are not reproduced exactly from one run to another. See this passing and this not with same tolerance 1e-3.

lsoucasse commented 2 months ago

I am not entirely satisfied with the image comparison. I have to set the tolerance very high also the images look the same. I think it is because of the required resizing step. I attach a diff example which gives a rms of 12. Any idea for improvement?

ref-failed-diff

nichollsh commented 2 months ago

Thanks for this, @lsoucasse. The coverage improvement is nice to see.

Regarding comparison of the helpfiles: does it tell you which variables are not up to the required tolerance? If it's on variables that don't matter too much, then this could be okay. If it's coming from the dummy test, then maybe it is associated with undefined behaviour in the last iteration?

Regarding the image comparison: if this is causing problems, maybe we can avoid using this method? If the check on the helpfile passes, then that should be sufficient, since the plot is only plotting the data in the helpfile anyway.

nichollsh commented 2 months ago

Ah, I can actually see this information in the output of the CI run. The variable which triggers the failure is the CH4 molar abundance, which is indeed associated with the outgassing and escape. The values which differ are:

At positional index 156, first diff: 4.281912e+17 != 4.288642e+17

Which I the penultimate iteration, and would certainly be impacted by the escape. This could simply be a case of the dummy model putting the simulation into an extreme case, since it uses a very high escape rate which reduces the value from 1e20 to 4e17 to zero across only three iterations. One option would be to create a dummy test with a smaller escape rate, which still tests the physics in a reasonable manner, and might hopefully make things more stable? Another easier option would be to not check these high order of magnitude variables, such as molecular molar and kg abundances. It's easier to check their partial pressures, and in theory they are measuring the same physics.

As an aside... I think this indicates that the time-stepping should account for escape. We shouldn't be taking such large steps that so much atmosphere (relatively) can escape in one iteration, particularly as we get towards the convergence criteria. Something to think about for the future.

nichollsh commented 2 months ago

And a note on the model runtime: I have found that the model runs a lot faster if you change the plotting period to a larger number.

lsoucasse commented 2 months ago

And a note on the model runtime: I have found that the model runs a lot faster if you change the plotting period to a larger number.

I can experiment with a lower dummy rate and see if it passes with lower tolerance. Would a value of 2e5 be relevant?

We only need the final plot so I guess setting the plot_iterfreq key to 0 would be OK?

nichollsh commented 2 months ago

Yeah 2e5 kg/s sounds okay. We could probably even do 1e4 kg/s. Setting plot_iterfreq=0 should indeed prevent any plotting except in the final step.

stefsmeets commented 2 months ago

Hi @lsoucasse I would say that if the plots are not reproducible, then that is a great result of having this test in the first place! I would avoid rescaling / increasing the tolerance to get the tests to pass, and instead mark the tests as xfail until we figure out what is causing the difference.

lsoucasse commented 2 months ago

Hi @lsoucasse I would say that if the plots are not reproducible, then that is a great result of having this test in the first place! I would avoid rescaling / increasing the tolerance to get the tests to pass, and instead mark the tests as xfail until we figure out what is causing the difference.

But to me the plots are reproducible. I extracted the CI results using GH artefacts and I cannot see the difference. The diff plot in the comment above shows the frame and the labels which are artifical differences. I had to resize the images before comparing and I blame this step for creating these artificial differences and high rms.

stefsmeets commented 2 months ago

Ah, could be font rendering. Or maybe that your setup picks a font that is unavailable on the CI or vice versa.

lsoucasse commented 2 months ago

With escape rate of 2e4 kg/s I can drop the tolerance down to 1e-4. And plotting only at the end reduces the total test run time to 4 min. I think plot_iterfreq=0 (or at least 50/100) should be the default option in all config files as is slows the code so much.

I set down the tolerance for the plots and flagged it with xfail. I will create an issue for solving this.

FormingWorlds / PROTEUS

Dummy test #188