ORNL / cpp-proposals-pub

Collaborating on papers for the ISO C++ committee - public repo
26 stars 26 forks source link

P2689: Bounded atomic_ref and atomic_accessor LEWG Presentation 04/30/2024 #456

Open crtrott opened 4 months ago

crtrott commented 4 months ago

Atomic Refs Bound to Memory Orderings & Atomic Accessors

The mdspan paper P0009 listed atomic accessors as a reason for having accessors in the first place.

One of the use cases is for parallel algorithms updating data in a way which has data races.

Consider the histogram computation:

template<class ExecT>
void compute_histogram(ExecT exec, float bin_size,
               std::mdspan<int, std::dextents<size_t,1>> output,
               std::mdspan<float, std::dextents<size_t,1>> data) {
  static_assert(std::is_execution_policy_v<ExecT>);

  std::for_each(exec, data.data_handle(), data.data_handle()+data.extent(0), [=](float val) {
    int bin = std::abs(val)/bin_size;
    bin = std::clamp(bin, size_t(0), output.extent(0));
    output[bin]++;
  });

Depending on whether ExecT is sequenced_policy or not, the update needs to happen atomically.

With just atomic_ref one could do:

  std::for_each(exec, data.data_handle(), data.data_handle()+data.extent(0), [=](float val) {
    int bin = std::abs(val)/bin_size;
    bin = std::clamp(bin, size_t(0), output.extent(0));
    atomic_ref(output[bin])++;
  });

This paper proposes a way for doing that for general mdspans without writing out atomic_ref everywhere inside a complex algorithm and also fixes the fact that you can't do the simple ++ with relaxed memory order - enough for these kind of accumulation cases.

What this paper proposes

Histogram with atomic accessor

// sequenced_policy does not attach atomic accessor
template<class T, class Extents, class LayoutPolicy>
auto add_atomic_accessor_if_needed(
    std::execution::sequenced_policy, mdspan<T, Extents, LayoutPolicy> m) {
        return m;
 }

// parallel policies attach atomic accessor:
template<class ExecutionPolicy, class T, class Extents, class LayoutPolicy>
auto add_atomic_accessor_if_needed(
    ExecutionPolicy, mdspan<T, Extents, LayoutPolicy> m) {
        return mdspan(m.data_handle(), m.mapping(), atomic_accessor<T>());
}

template<class ExecT>
void compute_histogram(ExecT exec, float bin_size,
               std::mdspan<int, std::dextents<size_t,1>> output,
               std::mdspan<float, std::dextents<size_t,1>> data) {
  static_assert(std::is_execution_policy_v<ExecT>);

  auto accumulator = add_atomic_accessor_if_needed(exec, output);

  std::for_each(exec, data.data_handle(), data.data_handle()+data.extent(0), [=](float val) {
    int bin = std::abs(val)/bin_size;
    bin = std::clamp(bin, size_t(0), output.extent(0));
    // this is now atomic for parallel policies
    accumulator[bin]++;
  });

Previous Question: Why not expose the templated type?

Wording Considerations

Open Questions