edoddridge / aronnax

An idealised isopycnal model that can be run either with n+1/2 layers, or with n layers and variable bathymetry.
http://aronnax.readthedocs.io/en/latest/
MIT License
23 stars 5 forks source link

Floating point exceptions when running tests #230

Closed renskegelderloos closed 3 years ago

renskegelderloos commented 3 years ago

6 of the 34 tests fail when I run pytest, all in output_preservation_test.py. The first 4 ( all variants of test_beta_plane_gyre_free_surf_Hypre_MPI) signal a floating point exception in the drv.simulate stage, while the other 2 (test_periodic_BC_Hypre and test_outcropping_Hypre) get through the drv.simulate stage succesfully but then quit with an IEEE_UNDERFLOW_FLAG floating point exception at the asserts_outputs_close stage. Any ideas what could be going wrong?

edoddridge commented 3 years ago

The underflow flag can happen even when the simulation has worked (it means that at least one number somewhere got too small to be represented properly). Can you paste in the full output of pytest? There should also be a number of image outputs in the directories of the failed tests - these should show the difference between the output produced when you ran the test and the previously blessed output that it is being compared against.

renskegelderloos commented 3 years ago

pytest_6fails.log Thanks for your quick reply! The full pytest log is attached. The 4 tests that fail at the drv.simulate stage don't produce figures. Of the other 2, the outcropping test case seems indeed to have run well and produce good results. The periodic_BC_Hypre test case produces results that are nowhere near the blessed output.

edoddridge commented 3 years ago

No worries.

Thanks for the log. It looks like all of the non-hypre tests pass, which is good news. You can confirm this by running pytest -k 'not Hypre' which will automatically deselect any tests using the external solver library.

But, that still leaves us with the tests that use Hypre failing, and some of them failing by blowing up, which isn't great. You mentioned in the other thread (#229) that you were having trouble installing Hypre from that url. Did it end up working? If not, where did you get it from?

I think it would be worth running make clean to remove traces of previous builds and then retrying the test suite.

The fact that test_beta_plane_gyre_free_surf_Hypre_MPI fails with

Inconsistency between h and eta:  1.579%
Inconsistency between h and eta:  1.579%
Inconsistency between h and eta:  1.615%
Inconsistency between h and eta:  1.615%
Inconsistency between h and eta:  4.961%
Inconsistency between h and eta:  4.961%
Inconsistency between h and eta:  4.841%
Inconsistency between h and eta:  4.841%
Inconsistency between h and eta: 16.566%
Inconsistency between h and eta: 16.128%
Inconsistency between h and eta: 16.128%
Inconsistency between h and eta: 16.566%
Inconsistency between h and eta: 95.702%
Inconsistency between h and eta: 90.589%
Inconsistency between h and eta: 95.702%
Inconsistency between h and eta: 90.589%
Inconsistency between h and eta: ******%
Inconsistency between h and eta: ******%
Inconsistency between h and eta: ******%
Inconsistency between h and eta: ******%
Inconsistency between h and eta: ******%
Inconsistency between h and eta: ******%
Inconsistency between h and eta: ******%
Inconsistency between h and eta: ******%
Inconsistency between h and eta: ******%
Inconsistency between h and eta: ******%
Inconsistency between h and eta: ******%
Inconsistency between h and eta: ******%

is particularly troubling, since it means that the solution has diverged into NaNs. This configuration should be very stable.

renskegelderloos commented 3 years ago

I agree, the blowing up is what's got me worried too.

pytest -k 'not Hypre' passes all test. However, it deselects 11 tests while only 6 fail, so it looks like at least some of the Hypre tests do pass. I also checked the executable; all 6 failed tests use the aronnax_external_solver_test, but the test_vertical_thickness_diffusion_Hypre_3_layers also uses this executable and does not fail.

I was not successful in retrieving Hypre by just cloning aronnax (reference not in tree error). I cloned it separately from https://github.com/hypre-space/hypre, which is the same link as the one in issue #229. It built fine though.

I tried the make clean route, which still yields the same 6 failed tests.

edoddridge commented 3 years ago

I can't reproduce the error you describe when cloning the model. I tried cloning aronnax from the repo, and it worked as expected. Is it possible that you haven't updated your code since #229 was merged? Alternatively, there are a couple of outdated feature branches hanging around. You aren't trying to use one of those are you? Would you be able to try downloading a fresh copy of the repository into a new folder?

I can't say I really understand what is going on. When I cloned a new copy of the repository, hypre came with it as expected and after following the build instructions everything worked and the tests all passed.

It might be worth trying the testing protocols in Hypre (https://hypre.readthedocs.io/en/latest/ch-misc.html#testing-the-library) to make sure that the copy you've installed is working correctly.

The system information in your pytest log file all looks good to me, so there's nothing obvious to try there.

renskegelderloos commented 3 years ago

I already removed the entire installation and tried again, so I'm sure. I checked with 'git show' and I really have the latest version. It's getting more confusing now, as the cloning seems to work fine on my Mac, but not on the clusters I'm trying to install Aronnax on. I'm guessing it's a hardware specific issue and I'll get in touch with the system helpdesk to see if that is the case. I'll close this issue for now; we can always reopen if I was wrong. Thanks for your help so far!

edoddridge commented 3 years ago

I'm sorry I couldn't help more. Following up with the system helpdesk is a good idea. Feel free to loop me into the discussions if I can help at all.

renskegelderloos commented 3 years ago

No worries; I got it to work with all tests passing. Thanks for your help!