Open robertsawko opened 3 months ago
Is this running on GPU?
Some Ascent actions (data binning, in particular) assume that GPU memory is accessible from the CPU (as is the case on Summit, Frontier, and other unified memory systems).
@cyrush Cyrus, do you have any suggestions?
Thanks for quick replies.
Is this running on GPU? No, this is an oldish all CPU system.
The really surprising thing is that everything else seems to be working just fine. I am still trying to reproduce the error on the AMReX tutorial. My colleague got this screenshot on the heat equation tutorial:
I think I misunderstood your original issue.
Your issue actually seems very similar to a different issue I reported here: https://github.com/AMReX-Codes/amrex/issues/2994. I never fully understood why that happened, but it was somehow (?) caused by an unrelated global variable.
I cannot reproduce the issue with the heat equation test. This is what I did.
$ spack install ascent
$ cd amrex-tutorials/ExampleCodes/Blueprint/HeatEquation_EX1_C/Exec
$ make -j DEBUG=FALSE USE_CONDUIT=TRUE USE_ASCENT=TRUE CONDUIT_DIR=/path/to/spack-installed-conduit ASCENT_DIR=/path/to/spack-installed-ascent
$ ./main2d.gnu.ex inputs_2d
I also tried DEBUG=TRUE
.
@robertsawko do you have any custom classes like mentioned in https://github.com/AMReX-Codes/amrex/issues/2994 ?
If it's a c++ static init and finalize class issue, those are very very hard to reason about. (The order for when things are deallocated is not guaranteed)
I will look into vtkm::cont::RuntimeDeviceTracker
so see if it could be subject to something like this.
I am really sorry - I am taking time to respond. Our HPC has serious I/O issues (which is why working on in situ is even more relevant!), but currently it's just unusable. If it's like this next week still, I will reproduce the environment locally and retry.
Hello,
Together with two colleagues we were using AMReX built-in Ascent integration. We religiously followed the three blueprint tutorials and implemented a function which uses our finest mesh to produce a mesh blueprint with
SingleLevelToBlueprint
and pass it to Ascent with some actions to execute.The simulations run okay, generate images as expected, but end up with a segfault:
This is strange for me in many ways. Firstly, it looks like AMReX actually finalizes fine. I am used to segfaults being quite fatal to running programs.
The problem happens in parallel and in serial. I cannot reproduce with a heat equation tutorial, but one of my colleague reports that he saw something like this in 2D heat equation too.
The code itself is not even very interesting:
I've run a backtrace on the core dump, but I am still none the wiser:
Could you please give us any suggestions as to what might be going wrong?