Atomic should be close to memory hierarchy to avoid live-lock, XiangShan implement this kind of Atomic is a bad design, which may introduce additional latency to atomic instruction: Xiangshan need acquire block from memory hierarchy, this will pollute L1D$, and may probe other cores(additional latency from coherence management).
If different cores try to lock a same address with atomic instructions, this will introduce live locks. which is dangerous for a bus interconnection!
https://github.com/OpenXiangShan/XiangShan/blob/ab2d19052a364aaeac7f77d1ad8c34b424f7552a/src/main/scala/xiangshan/backend/MemBlock.scala#L125
Here is what we didn't in RocketChip https://github.com/chipsalliance/rocket-chip/blob/86a2f2cca699f149bcc082ef2828654a0a4e3f4b/src/main/scala/rocket/DCache.scala#L558-L573
Why this is not good?
Atomic should be close to memory hierarchy to avoid live-lock, XiangShan implement this kind of Atomic is a bad design, which may introduce additional latency to atomic instruction: Xiangshan need acquire block from memory hierarchy, this will pollute L1D$, and may probe other cores(additional latency from coherence management). If different cores try to lock a same address with atomic instructions, this will introduce live locks. which is dangerous for a bus interconnection!
How to resolve it?
TileLink directly support this via its spec. So bypass to bus interconnect will be better. I think this is something need to be managed by system cache or broadcast(coherence manager), since they are close to memory, but only one thing you need to handle additionally is periphery devices, since they are not handled by coherence. https://github.com/chipsalliance/rocket-chip/blob/86a2f2cca699f149bcc082ef2828654a0a4e3f4b/src/main/scala/subsystem/PeripheryBus.scala#L51-L58 You need a additional bus device to take over this.