Open mppf opened 9 months ago
ideally, a documented way to choose how many tasks to divide up a range into (as _computeNumChunks does today in the standard library)
Related to this, there is the RangeChunk library, which has never gotten as much use or attention as it probably deserves. At one point, we had hoped to retire all internal ways of chunking things up in favor of using the RangeChunk library as a means of exercising it and making sure it was working as intended, but I don't believe we managed to complete that vision.
Here's a cute use of the serial
statement that we should wrestle with when we take this issue up. From the revcomp8.chpl benchmark:
serial (nextSeqStart < endOfRead) do
forall j in 0..<endOfRead do
buff[j] = buff[j + nextSeqStart];
The notion is that we're doing a memcpy-like operation, and if the source and destination don't overlap, we can do it in parallel, but if they do overlap, we need to do it serially.
PR #24387 makes the
serial
statement unstable (to be deprecated). This issue discusses what would be a suitable replacement for it.A number of the use cases for
serial
involve disabling parallelism when it is not useful. For example, it can disable parallelism in a call to a function that uses parallel constructs but is being run within another parallel loop.The thinking is that
serial
a bit too blunt of a tool for this use case. Instead, we would like for parallel code that might need to run nested in another parallel construct to be aware of this issue. That's already the case for something likeforall i in 1..n
because the parallel iterators in the library check the number of running tasks and adjust how many to run based on that. But, to make this strategy reasonable for arbitrary programs not in the standard library, we need to have more user-facing functionality:_computeNumChunks
does today in the standard library)There is also a proposal to enable patterns such as
begin if (notTooBusy) { ... }
as an alternative to the conditionalserial
statement in #13518.For
forall
statements, avoiding un-needed parallel overhead might be handled through a loop configuration (see issue #16405 and https://github.com/Cray/chapel-private/issues/5216 ). This case comes up a lot since there are lots of implicitly parallel operations (e.g. allocating an array or performing a reduction). Another possibility is to have something likeserial
that only applies toforall
but not tobegin
.