Implement atomicrmw CodeGen

RV does not support LLVM atomicrmw yet (https://llvm.org/docs/LangRef.html#atomicrmw-instruction). Currently, RV's lane threads aren't considered concurrent threads in terms of the LLVM execution model and so atomicrmw remains scalar.

What needs to change: The result of atomicrmw is always a varying value. Otherwise, this is mostly a RV codegen issue (NatBuilder.cpp).

When the backend vectorizes an atomic instruction, it should apply the operator of the atomic (add, umin, umax, xor, ..) to reduce the value vector into a scalar value and emit just one atomicrmw with the reduced value.

What is tricky about atomicrmw is two things:

Fairness - who "wins" in a vector xchg? RV does not give any (lane)thread fairness or even liveness guarantees.
The result (vector) value - what will be the return vector value? The backend will need to emit a prefix-sum like operation over the reduced vector to simulate the incrementally updated value for each lane.

cdl-saarland / rv

Implement atomicrmw CodeGen #49