Closed stephenswat closed 5 months ago
As you noticed, one cannot just use static_assert(...)
in a "non-templated" function of a templated class. As soon as the class is instantiated, the assertion kicks in. It doesn't only happen when the function is called. :frowning:
If you forego the removal of the naive, non-atomic implementation, then I'll be happy to get this fix in. But I'd rather not open the can of worms with how vecmem::device_atomic_ref
should behave on the host before C++20. :frowning: I don't think that will lead us anywhere useful.
My bad, I had hoped that any of the templates would have been on the function, not on the class, but sadly not. Anyway there will be a bit more work to do here anyway, so I'll come up with a more comprehensive solution.
Note that I've been thinking for a while now about introducing cuda::atomic_ref
in this code. :thinking: Similar to how we use sycl::atomic_ref
, "under the right circumstances" the code should just use cuda::atomic_ref
, as is. You should check if you could make that happen.
Okay, the scope of this PR has grown a little bit to fix a whole bunch of other issues with the atomic references. Also adds additional compile-time checks on the functionality of atomic references as well as runtime tests.
Okay, let's see what the MSVC CI thinks of this.
Okay so MSVC doesn't support atomics on unsigned integers. :laughing:
Someone explain to me how this commit breaks the synchronized memory resource on release builds in MSVC and in those builds alone.
Ah, of course.
Including intrin.h
breaks locks and mutexes.
https://github.com/stephenswat/vecmem/actions/runs/9440139525/job/25999074005
:clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face: :clown_face:
I got rid of the MSVC intrinsics and replaced them by a non-atomic implementation of CAS, but at least now there is only one point of fake atomicity in this code.
While working on https://github.com/acts-project/traccc/pull/595, I found out that the vecmem implementation of atomic CAS is fundamentally broken on CUDA platforms :worried:. Currently, the functionality is
compare_exchange_strong
is broken because it relies on the CUDAatomicCAS
builtin which functions fundamentally differently from the C++ STL version of the equivalent code. Indeed, the C++ version returns true on a succesful swap and false otherwise. The CUDA version always returns the old value. As such, if the old value is false-like, e.g. 0, thecompare_exchange_strong
function will always appear to fail, even if it succeeded. This commit fixes the above issue.I also removed the backup implementation of CAS as it was not atomic in any way and was basically lying to users about working atomically :worried:.