a suitable replacement for the serial statement?

mppf commented 9 months ago

PR #24387 makes the serial statement unstable (to be deprecated). This issue discusses what would be a suitable replacement for it.

A number of the use cases for serial involve disabling parallelism when it is not useful. For example, it can disable parallelism in a call to a function that uses parallel constructs but is being run within another parallel loop.

The thinking is that serial a bit too blunt of a tool for this use case. Instead, we would like for parallel code that might need to run nested in another parallel construct to be aware of this issue. That's already the case for something like forall i in 1..n because the parallel iterators in the library check the number of running tasks and adjust how many to run based on that. But, to make this strategy reasonable for arbitrary programs not in the standard library, we need to have more user-facing functionality:

a documented way to check the current number of running tasks
ideally, a documented way to choose how many tasks to divide up a range into (as _computeNumChunks does today in the standard library)
documentation describing how to use these features to avoid parallel overheads when code using task constructs is run within a forall loop.

There is also a proposal to enable patterns such as begin if (notTooBusy) { ... } as an alternative to the conditional serial statement in #13518.

For forall statements, avoiding un-needed parallel overhead might be handled through a loop configuration (see issue #16405 and https://github.com/Cray/chapel-private/issues/5216 ). This case comes up a lot since there are lots of implicitly parallel operations (e.g. allocating an array or performing a reduction). Another possibility is to have something like serial that only applies to forall but not to begin.

bradcray commented 9 months ago

ideally, a documented way to choose how many tasks to divide up a range into (as _computeNumChunks does today in the standard library)

Related to this, there is the RangeChunk library, which has never gotten as much use or attention as it probably deserves. At one point, we had hoped to retire all internal ways of chunking things up in favor of using the RangeChunk library as a means of exercising it and making sure it was working as intended, but I don't believe we managed to complete that vision.

bradcray commented 7 months ago

Here's a cute use of the serial statement that we should wrestle with when we take this issue up. From the revcomp8.chpl benchmark:

      serial (nextSeqStart < endOfRead) do
    forall j in 0..<endOfRead do
          buff[j] = buff[j + nextSeqStart];

The notion is that we're doing a memcpy-like operation, and if the source and destination don't overlap, we can do it in parallel, but if they do overlap, we need to do it serially.

chapel-lang / chapel

a suitable replacement for the serial statement? #24388