LLNL / RAJA

RAJA Performance Portability Layer (C++)
BSD 3-Clause "New" or "Revised" License
474 stars 102 forks source link

Remove volatile from atomics #1672

Closed adayton1 closed 3 months ago

adayton1 commented 3 months ago

Summary

adayton1 commented 3 months ago

@MrBurmark, would you be willing to look at the changes to hip atomics? The previous implementation of atomicCAS relied on volatile, so I changed the implementation to use an atomic load. To avoid a circular dependency, I've changed in the implementation of atomicLoad to use the intrinsic if available, otherwise I fall back to atomicOr(address, 0). Does that make sense, or would it be better to fall back to atomicCAS(address, 0, 0) or something else? I think atomicOr will be better than atomicAdd, but I'm not sure if atomicCAS can avoid the write in some cases. I also modified atomicExchange to use reinterpret casting instead of atomicCAS in a loop, then made atomicStore use atomicExchange if the intrinsic is not available.

If these changes make sense, then I will do basically the same thing in CUDA and then we can fully get rid of volatile.

MrBurmark commented 3 months ago

Its best to avoid using atomicCAS(0,0) where possible as its slower than doing something like atomicOr(0) or atomicAdd(0). Between atomicOr(0) or atomicAdd(0) I don't know which is faster, maybe #1624 could provide some insight?

adayton1 commented 3 months ago

I'll be out most of this afternoon and all day tomorrow, so if the tests come back passing, feel free to merge when you think it is ready.

MrBurmark commented 3 months ago

This is looking pretty good. You double checked which types were available in which hardware for cuda and hip? It looks like the cuda and hip backend are more similar now but they are still a bit different on which types are supported for which hardware.

adayton1 commented 3 months ago

This is looking pretty good. You double checked which types were available in which hardware for cuda and hip? It looks like the cuda and hip backend are more similar now but they are still a bit different on which types are supported for which hardware.

Yeah, I double checked the types. The main differences are that Hip doesn't provide an atomicInc or atomicDec, and CUDA supports an additional type for atomicMin and atomicMax.

adayton1 commented 3 months ago

I keep hitting unrelated errors in the CI:

[info: cloning spack develop branch from github] [exe: git clone --single-branch --depth=1 -b develop-2024-05-26 https://github.com/spack/spack.git spack] Cloning into 'spack'... error: RPC failed; curl 18 transfer closed with outstanding read data remaining error: 3939 bytes of body are still expected fatal: the remote end hung up unexpectedly fatal: early EOF fatal: index-pack failed [spack python: /bin/sh: /dev/shm/lassen53-1939417/spack/bin/spack: No such file or directory] [Checking for concretizer options...] [disabling config scope (except defaults) in: /dev/shm/lassen53-1939417/spack/lib/spack/spack/config.py] Traceback (most recent call last): File "./scripts/uberenv/uberenv.py", line 1389, in sys.exit(main()) File "./scripts/uberenv/uberenv.py", line 1335, in main env.patch() File "./scripts/uberenv/uberenv.py", line 892, in patch self.disable_spack_config_scopes() File "./scripts/uberenv/uberenv.py", line 862, in disable_spack_config_scopes cfg_script = open(spack_lib_config).read() FileNotFoundError: [Errno 2] No such file or directory: '/dev/shm/lassen53-1939417/spack/lib/spack/spack/config.py'