Alpine-DAV / ascent

A flyweight in situ visualization and analysis runtime for multi-physics HPC simulations
https://alpine-dav.github.io/ascent/
Other
187 stars 63 forks source link

2024/04 VTK-m Questions #1270

Open cyrush opened 2 months ago

cyrush commented 2 months ago

1) Device Usage (related to: https://github.com/Alpine-DAV/ascent/discussions/1267)

Ascent has logic to force a specific device or fail - can set a precedence chain for devices? How do we know for sure which device was used for an execution?

2) Amrex Folks are seeing a crash on finalization:

https://github.com/AMReX-Codes/amrex/issues/3873

From the stack trace, it appears to be a shared pointer clean up of vtkm::cont::RuntimeDeviceTracker Could there be a C++ static finalization issue, is there a singleton pattern here?

nicolemarsaglia commented 2 months ago
  1. Status of VTK-m updating to Kokkos 4.2 and status of next VTKm release (2.1.1 or 2.2)
kmorel commented 2 months ago
  1. You can force VTK-m to use a particular device through vtkm::cont::RuntimeDeviceTracker. The safest way to do this is to create a vtkm::cont::ScopedRuntimeDeviceTracker to force a particular device, which will remain in effect as long as the object stays in scope (see this part of the user's guide). Since you want this behavior to always be in effect, you can force the device everywhere (sort of) by calling vtkm::cont::GetRuntimeDeviceTracker().ForceDevice(...) (see this part of the user's guide). The setting is thread-specific, so if things are multi-threaded, make sure you do this early on a thread that will stick around (like on the main thread right after calling vtkm::cont::Initialize()).

    There is currently no way to verify which device something ran on. I would like to implement something for that, but we don't have it yet.

  2. That's not good. There is indeed a singleton pattern in vtkm::cont::GetRuntimeDeviceTracker. It's made more complicated by the fact that the singleton is thread local. The pattern looks pretty safe, but perhaps we are running into the same problem as reported in this stackoverflow. Perhaps something in a global variable destructor is calling GetRuntimeDeviceTracker, which is creating a thread-local variable that immediately gets destroyed. Are you using CUDA? The most likely culprit I could find was in the function to free memory in the CUDA device adapter (not the Kokkos one).

    It might help to call vtkm::cont::GetRuntimeDeviceTracker() right after vtkm::cont::Initialize() assuming that this happens early in the program and on the main thread. I might have to rethink how the singleton is managed.

  3. I'm not sure when we are planning to release the next version. Things slowed down quite a bit after ECP ended. I'm sure we could start the process if you could use a new release. @vicentebolea should be able to provide more details.

vicentebolea commented 2 months ago

I'm not sure when we are planning to release the next version. Things slowed down quite a bit after ECP ended. I'm sure we could start the process if you could use a new release. @vicentebolea should be able to provide more details.

I can definitely help with this if you guys are ready for a release, however, it appears that there is not much new material in master from the last release (2.1), we can make a 2.1.1 for bugfixes that we had since the last release.

nicolemarsaglia commented 2 months ago

@vicentebolea @kmorel thanks for the information! It would be great to get a 2.1.1 at some point. Right now we have a branch that uses the MergeDataSets filter from after the 2.1 release.

kmorel commented 2 months ago

I would argue that the introduction of a new filter is more than just a patch release. We should just create a 2.2 release even if there is not a huge amount of new features.