alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
339 stars 69 forks source link

definition of atomic operations #399

Open psychocoderHPC opened 6 years ago

psychocoderHPC commented 6 years ago

What is the definition of an atomic operation in alpaka. Is a atomic operation only atomic between the same operation type (means + and - to the same address are not save) or are all atomic operations to an address atomic independent to the operation (access and op on same address are guaranteed thread save).

Case 1: only the same kind of operation is thread safe Case 2: only the same kind and type used within the operation is thread safe Case 3: the access to a memory address is thread safe independent of the operation Case 4: the access to a memory address of the same type is thread safe independent of the operation

psychocoderHPC commented 6 years ago

From the view of an programmer the case 4 would be the best one because sometimes it is useful to mic and atomicAdd and atomicExch.

In the current alpaka implementation Case 4 is fulfilled except AtomicOmpCritSec where compare and swap is not atomic to all other operations.

psychocoderHPC commented 6 years ago

I am currently implementing fast atomic operations for AtomicOmpCritSec and found that atomicMin, atomicMax and compare and swap can not be implemented to fulfill case 4 :-( without a for loop and than only for fundamental types.

To allow the user to specialize the atomic operation for own type e.g by using omp critical section than it can only fulfill case 2.

Documentation

BenjaminW3 commented 6 years ago

What is the behaviour in CUDA? We should not aim for more than CUDA is guaranteeing.

psychocoderHPC commented 6 years ago

IMO case 4 is fulfilled for CUDA, I can't found any source which says something other. A test is not so easy because to test if case 4 is fulfilled all positives must be interpreted as false positives (due to the nature of race conditions), only a negative test will is exact.

psychocoderHPC commented 6 years ago

If I implement AtomicOmpCritSec with omp atomic capture is will be faster than the new AtomicStlLock #398 but only supports fundamental types.

btw: CUDA also supports a hand full types.

I thing it is save to define atomic in alpaka as in case 4 as long as CUDA also only support `float,doubl, int unsigned int)

BenjaminW3 commented 6 years ago

I would expect CUDA to implement Case 3 but we might never find out.

psychocoderHPC commented 6 years ago

Not sure case 3 means that you can mix 64 and 32 bit types because only the address counts.

psychocoderHPC commented 6 years ago

Is there anything against the case 4 definition from your side? If not I will implement the new omp atomics as case 4.

e.g.

T old;
auto & ref(*addr);
// atomically update ref, but capture original value in old
#pragma omp atomic capture
{ old = ref; ref += value; }
BenjaminW3 commented 6 years ago

As of now, I see nothing that speaks against it

j-stephan commented 2 years ago

@psychocoderHPC Was this fixed by #664?