eagles-project / mam4xx

A C++ implementation of MAM4
https://eagles-project.github.io/mam4xx/
Other
7 stars 6 forks source link

SIMD (Pack) logic in conversions.hpp needs to be hardened/handled #6

Closed jeff-cohere closed 1 year ago

jeff-cohere commented 2 years ago

From @overfelt :

For those little functions in conversions.hpp where division takes place, there will be a divide by zero for SIMD types that do not fill up all the values and contain some garbage. There are times when trying to track down the origin of a NaN where this is inconvenient. If there were some way in a debug build where efficiency is not so critical that NaNs could be avoided, it might be useful.

pbosler commented 2 years ago

After #28, our views will be initialized to zero except for padding (when pack size > 1), in which case the padded entries will still be nans. This makes a simple EKAT_KERNEL_ASSERT( (input_arg > 0).all() ) not so useful, since it will fail on the nans, even when we want those nans.

An option is to replace possibly problematic division by x with multiplication by haero::FloatingPoint<PackType>::safe_denominator(x), which won't introduce any new nan values. The disadvantages are that it won't detect divide-by-zero, and it may increase floating point roundoff error.

@jeff-cohere , @overfelt if this is a strategy we like, I can do it -- I need to add a few more functions to conversions.hpp (to and from the modal diameter & volume) as part of water uptake work anyway, so I'll be working on that file soon.

pbosler commented 2 years ago

Whatever we decide, we should consider adding a CI build that turns on EKAT's FPE utilities.

-DEKAT_ENABLE_FPE=ON -DEKAT_ENABLE_FPE_DEFAULT_MASK=ON

Looking at ekat/CMakeLists.txt lines 56--59, it looks like they should be on by default, but in all of my builds they're turned off. We know they don't work with Mac, but even my linux builds have them turned off.

Edit: The default ekat CMake options set EKAT_ENABLE_FPE=ON, but then sets the default FPE mask to 0, which effectively turns them off in the impl file ekat_arch.cpp lines 50--73. So, I think, to turn them on we'll have to set both EKAT_ENABLE_FPE=ON and an FPE mask.

The default mask turns on divide-by-zero, invalid value, and overflow. That covers everything we'd need, I think.

jeff-cohere commented 2 years ago

We could switch FPE on for Linux Debug builds in Haero's build system by setting the appropriate variables in Haero's ext/CMakeLists.txt. That would make sure we always got floating point exceptions when we expected them on Linux. Thoughts?

overfelt commented 2 years ago

For the regression tests we could try turning on FPE and just see how many failures there are. If the number of levels in a column is a multiple of the pack size, then we should not have unit tests that get NaNs. If we decide a test is a legit NaN, we could turn off the checking at the test level and avoid setting FPE detection in user code.

jeff-cohere commented 1 year ago

This is a non-issue now that we've given Packs the boot. 👢