Open psychocoderHPC opened 6 years ago
From the view of an programmer the case 4 would be the best one because sometimes it is useful to mic and atomicAdd and atomicExch.
In the current alpaka implementation Case 4
is fulfilled except AtomicOmpCritSec
where compare and swap is not atomic to all other operations.
I am currently implementing fast atomic operations for AtomicOmpCritSec
and found that atomicMin
, atomicMax
and compare and swap
can not be implemented to fulfill case 4 :-( without a for loop and than only for fundamental types.
To allow the user to specialize the atomic operation for own type e.g by using omp critical section
than it can only fulfill case 2.
What is the behaviour in CUDA? We should not aim for more than CUDA is guaranteeing.
IMO case 4 is fulfilled for CUDA, I can't found any source which says something other. A test is not so easy because to test if case 4 is fulfilled all positives must be interpreted as false positives (due to the nature of race conditions), only a negative test will is exact.
If I implement AtomicOmpCritSec
with omp atomic capture
is will be faster than the new AtomicStlLock #398 but only supports fundamental types.
btw: CUDA also supports a hand full types.
I thing it is save to define atomic in alpaka as in case 4 as long as CUDA also only support `float,doubl, int unsigned int)
I would expect CUDA to implement Case 3 but we might never find out.
Not sure case 3 means that you can mix 64 and 32 bit types because only the address counts.
Is there anything against the case 4 definition from your side? If not I will implement the new omp atomics as case 4.
e.g.
T old;
auto & ref(*addr);
// atomically update ref, but capture original value in old
#pragma omp atomic capture
{ old = ref; ref += value; }
As of now, I see nothing that speaks against it
@psychocoderHPC Was this fixed by #664?
What is the definition of an atomic operation in alpaka. Is a atomic operation only atomic between the same operation type (means
+
and-
to the same address are not save) or are all atomic operations to an address atomic independent to the operation (access and op on same address are guaranteed thread save).Case 1: only the same kind of operation is thread safe Case 2: only the same kind and type used within the operation is thread safe Case 3: the access to a memory address is thread safe independent of the operation Case 4: the access to a memory address of the same type is thread safe independent of the operation