acts-project / vecmem

Vectorised data model base and helper classes.
https://acts-project.github.io/vecmem/
Mozilla Public License 2.0
19 stars 13 forks source link

Increase the robustness of `device_atomic_ref` #275

Closed stephenswat closed 5 months ago

stephenswat commented 5 months ago

While working on https://github.com/acts-project/traccc/pull/595, I found out that the vecmem implementation of atomic CAS is fundamentally broken on CUDA platforms :worried:. Currently, the functionality is compare_exchange_strong is broken because it relies on the CUDA atomicCAS builtin which functions fundamentally differently from the C++ STL version of the equivalent code. Indeed, the C++ version returns true on a succesful swap and false otherwise. The CUDA version always returns the old value. As such, if the old value is false-like, e.g. 0, the compare_exchange_strong function will always appear to fail, even if it succeeded. This commit fixes the above issue.

I also removed the backup implementation of CAS as it was not atomic in any way and was basically lying to users about working atomically :worried:.

krasznaa commented 5 months ago

As you noticed, one cannot just use static_assert(...) in a "non-templated" function of a templated class. As soon as the class is instantiated, the assertion kicks in. It doesn't only happen when the function is called. :frowning:

If you forego the removal of the naive, non-atomic implementation, then I'll be happy to get this fix in. But I'd rather not open the can of worms with how vecmem::device_atomic_ref should behave on the host before C++20. :frowning: I don't think that will lead us anywhere useful.

stephenswat commented 5 months ago

My bad, I had hoped that any of the templates would have been on the function, not on the class, but sadly not. Anyway there will be a bit more work to do here anyway, so I'll come up with a more comprehensive solution.

krasznaa commented 5 months ago

Note that I've been thinking for a while now about introducing cuda::atomic_ref in this code. :thinking: Similar to how we use sycl::atomic_ref, "under the right circumstances" the code should just use cuda::atomic_ref, as is. You should check if you could make that happen.

stephenswat commented 5 months ago

Okay, the scope of this PR has grown a little bit to fix a whole bunch of other issues with the atomic references. Also adds additional compile-time checks on the functionality of atomic references as well as runtime tests.

stephenswat commented 5 months ago

Okay, let's see what the MSVC CI thinks of this.

stephenswat commented 5 months ago

Okay so MSVC doesn't support atomics on unsigned integers. :laughing:

stephenswat commented 5 months ago

Someone explain to me how this commit breaks the synchronized memory resource on release builds in MSVC and in those builds alone.

stephenswat commented 5 months ago

Ah, of course.

Including intrin.h breaks locks and mutexes.

https://github.com/stephenswat/vecmem/actions/runs/9440139525/job/25999074005

:clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face:

stephenswat commented 5 months ago

I got rid of the MSVC intrinsics and replaced them by a non-atomic implementation of CAS, but at least now there is only one point of fake atomicity in this code.