Closed bcaddy closed 7 months ago
Maybe instead of having really short test problems we can but in an early exit? Either a runtime or compile time option that has cholla exit gracefully after 5 time steps or something.
Edit: We already have this in the form of the N_STEPS_LIMIT
macro
Currently the init checks have been "solved" by just initializing all memory when allocated. This doesn't actually solve the problem of reading uninitialized memory though, it just hides it.
@evaneschneider
FYI, the CUDA compute sanitizer can find unused allocations (assuming we remove the bulk initialization). A quick inspection shows that the integrator intermediate arrays (flux, interfaces, etc) are underutilized by 1-10% depending on the array. With some clever indexing we could definitely free up some memory.
I'd like to add some compute sanitizer runs to our automated testing at some point to catch potential errors in the CUDA code. To do this we would need to address the issues raised in #197 and have extremely small test problems to run on for each build type.
There's a script for running the compute sanitizer in the tools directory:
tools/cholla-nv-compute-sanitizer.sh
Issue #197
I thInk I've resolved the init check issues here but not the rest. We should check and get those issues resolved
Small Test Problems
To run the compute sanitizer we need to actually run Cholla on some problem. The trick is that some of the compute sanitizer checks, namely mem check, cause the code to run crazy slow; of order 30s per time step. Since most of the code is the same from time step to time step we need simple problems for each build type that only run for a handful of time steps. In the case of hydro this could be a very low resolution Sod tube with a limited run time. A similar MHD shock tube would work for MHD but I don't know about the other build types.