chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.77k stars 417 forks source link

begin? for optional tasks #16387

Open mppf opened 3 years ago

mppf commented 3 years ago

Split-off from #13518.

Currently, begin, cobegin, and coforall always create qthreads-style tasks. However, for recursive algorithms, we would like it be easy to create tasks only when there aren't already a lot of tasks going.

Some parallel languages have a begin like construct is inherently optional - the implementation doesn't create a task if there is already enough parallel work. For example Cilk tasks work that way. This even has an impact on compiler analysis and optimization for the parallel tasks. Another term for this is to say that these tasks are serializable.

The begin constructs in Chapel however insist on creating the tasks because these tasks could do things like wait on each other with sync variables. This makes it awkward to write efficient recursive algorithms using these constructs (see for example a specific case in #13518). In contrast, using forall is pretty easy even in recursive algorithms because the built-in iterators supporting forall limit the number of tasks created based upon here.runningTasks() (typically done in _computeNumChunks and controlled by dataParTasksPerLocale / dataParIgnoreRunningTasks). But, using forall for algorithms like merge sort or quick sort that have 2 recursive problems is awkward and strange as well.

Additionally, Chapel has a serial blocks - serial { } - which serializes all tasks launched within the block. This presents a problem for libraries. Consider a library that - somewhere deep inside a call - needs to create a task that can block in order to function correctly. For example, it might need to create some kind of helper task like a progress thread. If the program calling the library makes the call within a serial block, then the library will simply not function. (It will probably deadlock). In a way, use of the serial block only makes sense if all tasks called in its body (including in called functions) are serializable. But, that property is very hard to see if the serial block contains function calls to libraries. Would documentation for all library functions need to include a note about whether or not all tasks the function creates are serializable?

This issue is proposing that users be able to opt into creating optional tasks. Similar to how forall works in practice, the implementation would automatically throttle the number of tasks. The serial block would then be adjusted to only affect these optional tasks. In other words, code creating non-optional tasks within a serial block would still create those tasks. (Note that this change to how serial works would be a breaking change).

One potential alternative would be for compiler analysis to identify begin / cobegin / coforall bodies that are serializable. However even so it would probably be useful to have a way that users can indicate optional tasks. Additionally, the interaction with the serial block suggests different behavior for optional and non-optional tasks. Having the serial block behave differently based upon analysis of contained task bodies does not seem like a good idea.

It would also be possible to migrate the task-throttling that happens currently in iterators supporting forall into using these optional tasks.

For specific syntax, this issue proposes that begin?, cobegin?, and coforall? create optional tasks; while begin, cobegin, and coforall would not. The ? is by analogy to nilability support - when it is applied to a class type it means an optional class type. begin?, cobegin?, and coforall? would be parsed as a single token.

For example (from #13518) with merge sort, the code could include a recursive call like this, and the tasks created would automatically be serialized by the implementation once there is enough parallelism.

  proc _MergeSort(...) {
    ...
    cobegin? {
      { _MergeSort(...); }
      { _MergeSort(...); }
    }
    ...
  }
e-kayrakli commented 3 years ago

This issue is proposing that users be able to opt into creating optional tasks. Similar to how forall works in practice, the implementation would automatically throttle the number of tasks. The serial block would then be adjusted to only affect these optional tasks. In other words, code creating non-optional tasks within a serial block would still create those tasks. (Note that this change to how serial works would be a breaking change).

I think this part needs a bit of a clarification. What would happen if there's a non-optional begin in the dynamic scope of a serial? That's an error, right (either compile time or execution time)? And if that's the case and that's the reason this would be a breaking change.

Assuming that's the case; one alternative idea that came up in our chat yesterday was to have begin!, cobegin!, coforall! instead of ? variants, and have those mean "mandatory task creation". This may help us stay closer to semantics we have today. I am not going so far as to say that'd be non-breaking, though.

The other way that you may have meant might be "non optional begin overrides serial" but that feels wrong.

mppf commented 3 years ago

The other way that you may have meant might be "non optional begin overrides serial" but that feels wrong.

That is what I meant. I'm happy to consider alternatives but it made sense to me for serial to squash begin? but not begin. If we had begin! then it would probably make sense for serial { begin { ... } } to be an error.

e-kayrakli commented 3 years ago

If we had begin! then it would probably make sense for serial { begin { ... } } to be an error.

That would be more appealing to me because:

mppf commented 3 years ago

Not a breaking change

It's still a breaking change, because certain patterns of serial { begin { } } that squash the begin today will become an error.

Edit: I mean that having begin? and begin! and begin - and having begin? in serial be squashed; begin in serial be an error; and begin! in serial create a task - that would be breaking.

You might have meant just "Let's interpret begin as optional begin and add begin! to create a task even if we are in a serial block". I think that is also breaking once the compiler has any analysis that relies on begin meaning optional begin.

e-kayrakli commented 3 years ago

You might have meant just "Let's interpret begin as optional begin and add begin! to create a task even if we are in a serial block". I think that is also breaking once the compiler has any analysis that relies on begin meaning optional begin.

This was my intention, yeah. I don't think an inner begin should override an outer serial. The compiler can do whatever it wants (or whatever it does today) for begin. begin! OTOH, would halt if we ever touch it from inside a serial block, for example. Just like a! halts when a is nil

mppf commented 3 years ago

We are discussing here 3 possible variations on behaviors for begin/cobegin/coforall:

  1. automatic task throttling and serial squashes
  2. no automatic task throttling but serial squashes
  3. no automatic task throttling and serial does not squash

We currently have the keywords mean 2. It's my view that 1 and 3 are more reasonable. However, we could make different keywords for each, e.g.

  1. begin? -- automatic task throttling and serial squashes
  2. begin -- no automatic task throttling but serial squashes
  3. begin! -- no automatic task throttling and serial does not squash

Doing that would not be a breaking change, but it creates 3 things where arguably we could have 2. I am curious if others think we need all 3 variants.

(If we were to move to only variants 1 and 3, that would be a breaking change, no matter which behavior begin uses, since it currently uses variant 2).

bradcray commented 3 years ago

We currently have the keywords mean 2. It's my view that 1 and 3 are more reasonable.

What is the point of the serial block in world 1 or 3 (assuming no other changes to the language)? In world 1, the tasks are automatically throttled, so if that's working correctly/well, it seems like the user would never be motivated to use the serial block. Whereas in world 3, it seems to have no purpose.

mppf commented 3 years ago

(To be clear: I was supposing that users have the ability to select between 1 and 3; I wasn't proposing that we change everything to 1 or 3.).

But I see your point; using serial with a 1 or 3 style begin seems a little bit useless. In practice when developing applications I think I would personally rather have automatic task throttling (and/or work stealing of not-yet-scheduled tasks) than serial blocks if I had to choose one. But for begin specifically, to move away from serial we would also need also a feature we do not have yet - namely as back-pressure on code creating tasks when there are too many tasks enqueued. Otherwise we just keep enqueuing until we run out of memory.

When does this kind of thing come up? It is not so rare to write a parallel application where there is a parallel region and then more stuff that the language considers parallel - like forall expressions - within it. If you have already worked hard to create load balance and parallelism in the outer region then these inner parallelisms just create overhead. Hence the serial block can wrap around them to avoid that overhead. (In some ways this is similar to the way the local block removes overhead within a carefully constructed application).

So, I suppose it would depend on to what extent automatic task throttling could match the performance of serial. Note though that I'm not so convinced that we'd keep automatic task throttling once we have work stealing for not-yet-scheduled tasks. (Except for the back-pressure for begins).

I don't think 3 is something you would intentionally use with serial blocks - more like adapting to the fact that they already existed. If we were to remove serial blocks, we wouldn't need option 3.

mppf commented 3 years ago

Noting that #12306 is related in that we could write begin? on Something { } to create aggregated remote task launches.

mppf commented 3 years ago

Noting that #9727 discusses the desire for begin on (or with this issue, maybe begin? on) to aggregate the task launches.