Open ronawho opened 1 year ago
I'd be in favor of this, and it also seems thematically related to https://github.com/chapel-lang/chapel/issues/7629 (which I also continue to be in favor of). Both seem to take the theme of elevating min/max to more of an equal sibling to other operators.
I found a case where this would be particularly useful. I have some code that's trying to compute min/max/average for some data & average can be handled with atomic adds (for the number of items and the total, to divide later). But min/max are nowhere to be seen for atomics.
Also in thinking about this (before coming across this issue) I found https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p0493r5.pdf which is a newer version of the article Elliot linked to above. It includes an example implementation which they've benchmarked and shown to be about as fast as an ARMv8 atomic-max instruction.
template <typename T>
T atomic_fetch_max_explicit(atomic<T>* pv,
typename atomic<T>::value_type v,
memory_order m) noexcept {
auto const mr = drop_release(m);
auto t = (mr != m) ? pv->fetch_add(0, m) : pv->load(mr);
while (max(v, t) != t) {
if (pv->compare_exchange_weak(t, v, m, mr))
return t;
}
return t;
}
It's not obvious to me if we'd want to implement min/max as read-conditionally-modify operations or a read-modify-write operations.
This newer C++ proposal is always RMW if the memory order is sequentially consistent, release, or acq_rel
ordering, but doesn't fret about it for relaxed or acquire ordering. (That is, assuming drop_release
does what I think it does. I can't find documentation or implementation of that, but I have a pretty high certainty that the fetch_add(0)
is required for memory_order_seq_cst
)
I've created PR #25900 with an initial attempt at this.
Our atomic support is largely based on C11/C++11, so which operations we support is also largely the same. The main current exception is that we also support operations on
real
's, but C++20 added support for that too. I think there'd be value in adding support for an atomicmin
andmax
as well. We have a few tests that manually implement atomic max (see 9689096677884fcf35b56ba7c22ec5172c65039b) and we have support for min/max reductions, which could be optimized by doing the cross task reductions with atomics.For other motivation, there's hardware support in some HPC networks and most communication libraries support them (including gasnet, ugni and libfabric.) Rust also supports atomic min/max and there's a proposal to add support in C++: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p0493r3.pdf.
It's not obvious to me if we'd want to implement min/max as read-conditionally-modify operations or a read-modify-write operations. The C++ proposal above uses RMW to match other operators, and I'd default to following that and optionally doing a RCM if users opt-in to relaxed memory order or something. I think in practice min/max won't be highly contended operations so there probably won't be much of a performance difference (RMW means all writers need exclusive cache line access)