Make refactoring test suite portable across machines

axch commented 7 years ago

The symptom: Running the MIM core in the way the test suite runs it produces different results on Ed's laptop than it did on my desktop, so bit-for-bit checking against checked-in known-good outputs fails.

Possible causes:

Binary format compatibility problem (Fortran "unformatted" output is alleged to be system-dependent).
Compiler/runtime/OS version difference leading to (hopefully small) differences in numerical output.

Possible solutions:

If it's only cause 2, could write a small Python program that compares corresponding pairs of output files semantically, e.g., by computing pointwise relative error, or pointwise absolute difference divided by maximum array element (a version of relative error for arrays of things that are supposed to be on the same scale).
- Pro: Would be useful to check soundness of changes that are expected to change the numerics but not much.
- Pro: If equipped with visualizations, could become a user-facing facility (What changed between this run and that one?)
- Con: Doesn't address cause 1.
Save the outputs in some portable format, either directly from the MIM core, or as a post-process conversion written in Python.
- Pro: This is a useful user-facing facility to have anyway.
- Con: Doesn't address cause 2 by itself, but does if combined with the preceding.
- Con: Writing formatted output is presumably slower than unformatted. Doing this automatically could affect the total latency of a run, especially if we do the simple thing of doing the conversion in post-processing, after the core simulation has finished.
- Con: The formatted files will presumably be larger, which would make them yet more annoying to keep checked in. Need to make a choice as to whether to use a human-readable format (still larger, conversion may be even slower) or a portable binary format. Are there standard formats in either category? netCDF?
Change the test process not to rely on checked-in known-good outputs, but instead to clone the MIM repo at a specific version (in a subdirectory), and generate the known-good outputs using that version.
- Note: It is not too difficult to arrange this to only download the blessed MIM core and build the outputs from it once, rather than wasting time doing it once per test run.
- Pro: Less auto-generated stuff in the git repository.
- Pro: Should solve both cause 1 and cause 2 at once.
- Con: (Somewhat) more complex test harness.
- Con: Not reusable for other purposes; generates none of the co-benefits of the other two solutions.

edoddridge commented 7 years ago

Some thoughts:

"Save the outputs in some portable format, either directly from the MIM core, or as a post-process conversion written in Python" deserves to have it's own ticket. This is potentially a very large feature that deserves to be discussed in its own right.
If this is caused by compiler/runtime differences, then running the test suite without any optimisations may be enough to fix the issue.
I like solution one because it gives us a mechanism for examining how numerical changes affect the output. This will be handy when dealing with #1. And, as mentioned it could be a useful facility for users.

edoddridge commented 7 years ago

I should also have mentioned that I need to edit https://github.com/edoddridge/MIM/blob/master/test/output_preservation_test.py#L51 and https://github.com/edoddridge/MIM/blob/master/test/output_preservation_test.py#L53 to contain "./MIM" in order for the test suite to run on my mac laptop.

edoddridge commented 7 years ago

Now that I've installed valgrind the two beta_plane_gyre tests still fail and I am informed of a tiny memory leak. It seems that the memory leak is not real, but is just an artefact of the OS.

This may be more evidence that cause 2 is responsible.

axch commented 7 years ago

The ./MIM is a function of whether or not . is mentioned in your $PATH environment variable. The test suite should not rely on that being the case, so ./MIM is appropriate.

axch commented 7 years ago

Does valgrind produce a failure exit status due to that memory leak? Paste the test runner output?

edoddridge commented 7 years ago

valgrind does not produce an error exit status - the test suite does not log a fail, and the following output is only displayed if the test suite detects a fail.

The output from valgrind is:

==71978== Memcheck, a memory error detector ==71978== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==71978== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==71978== Command: ./MIM ==71978== STOP 0 ==71978== ==71978== HEAP SUMMARY: ==71978== in use at exit: 22,263 bytes in 194 blocks ==71978== total heap usage: 6,293 allocs, 6,099 frees, 11,069,230 bytes allocated ==71978== ==71978== LEAK SUMMARY: ==71978== definitely lost: 80 bytes in 2 blocks ==71978== indirectly lost: 0 bytes in 0 blocks ==71978== possibly lost: 0 bytes in 0 blocks ==71978== still reachable: 0 bytes in 0 blocks ==71978== suppressed: 22,183 bytes in 192 blocks ==71978== Rerun with --leak-check=full to see details of leaked memory ==71978== ==71978== For counts of detected and suppressed errors, rerun with: -v ==71978== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

axch commented 7 years ago

OK. Then I wouldn't worry about it. Memory leaks are less bad than unallocated memory access, so we can defer systematically looking for them. This one in particular is small enough to ignore.

axch commented 7 years ago

@edoddridge does the refactoring suite pass on your machine now? Can we close this issue as done, at least for the time being?

edoddridge commented 7 years ago

The test suite passes on my laptop, so I agree we can close this.

edoddridge / aronnax

Make refactoring test suite portable across machines #27