Open manopapad opened 1 year ago
You want this ability just inside the mapper or also in the application? The mapper is probably straight-forward since there's no deferred execution in the mapper, but if you want it in the application that will be a bit more work.
Does deferred execution matters? If we have the following code:
task foo() {
// do stuff
push green
// do stuff
launch task B
pop
// do stuff
}
The launch task B
is a deferred execution, and the actual task B will be visualized somewhere else but not inside task foo.
It depends entirely on what "do stuff" is. If "do stuff" is perform computation in this task then no. If "do stuff" is launch some sub-tasks, then it matters quite a lot.
My usecase is for the mapper, but @rohany expressed interest in having user-controlled ranges in application tasks (I'll let him explain his usecase more), so I made the ask a bit more general.
I was assuming that the range-enter / range-exit annotations would not consider deferred execution. I.e. all that the runtime would need to do is dump a marker at the point when control reaches a "push" or "pop" point, regardless of any asynchronous work that had been launched prior to that point. If the code did this:
enter task
push "range 1"
launch sub-task
pop
wait on sub-task
then the range on the profile would only include the launching time, not the waiting. This is sufficient for profiling mapper calls, but maybe @rohany had different requirements?
I agree with @manopapad , the range will only include the launching time, and the wait will be shown as WaitInterval
My usecase is for the mapper, but @rohany expressed interest in having user-controlled ranges in application tasks (I'll let him explain his usecase more), so I made the ask a bit more general.
I think having it in the application would be quite beneficial to many users. The use case I'm imagining was when thinking about adding some more sophisticated fusing / partitioning logic to legate. In the case where we batch many tasks up together and try to fuse or bulk partitioning things, it would be nice to demarcate the ranges that we actually spend thinking about this, to understand the overheads involved. I suspect other applications that do non-trivial reasoning in top-level tasks would also be interested in this. What I don't think we should do is try to support something like this at small granularities to help profile hot loops or something in compute task bodies, as that's something better left for other benchmarking tools.
I suspect everyone is going to want what @rohany is asking for, but that will take more time to implement, but I'd rather do that than something just hacks in the range timing for launching tasks. If we're just going to support this in the mapper to start though then that is an obvious thing to do since there is no deferred execution there.
demarcate the ranges that we actually spend thinking about this
It didn't sound to me like Rohan is asking for anything different than I did. He also just wants to show on the profile a range of time spent doing computation in the task, which includes just time in the task, and doesn't recursively include any time spent in launched sub-tasks.
Another useful case for application-level support we found is to analyze top-level tasks. With the current profiler, the users only see a single big top-level task running as follows:
But, with this functionality, users can see more fine-grained tasks within a single big top-level task. This will allow us to investigate issues with top-level tasks, e.g., #1448.
An alternative visualization would be to "stack" the different ranges:
We could possibly do this "on demand", when the user hovers/clicks on a box (otherwise we only show the top parent box).
The application code would be able to open and close "ranges" inside a task, and associate a name with each range. Nesting of ranges would be supported.
In the profile, these ranges would be overlaid directly on top of the parent task's box (like a wait interval is visualized today):
Note that a wait can occur while we have pushed a user range, so on the profile we would go to a "waiting" state, then back into the last used color.
This feature would be useful for debugging mapper performance (we would mark on the profile which part of a mapper call task is executing user code, and which is doing runtime work), and for apps to visualize how much time certain parts of a task are taking.
On the profiler side, this could be done as a generalization of the existing
TaskWaiter
infrastructure; instead of having a fixed sequence of points (created, ready, start, stop), we could generalize this to any number of intervals (of which the default happen to be running/blocked/ready).Note that we can't actually paint over rectangles in the new UI. So at some point in the pipeline we need to slice things into non-overlapping segments.