StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
675 stars 145 forks source link

Sub-task ranges on profiles #1439

Open manopapad opened 1 year ago

manopapad commented 1 year ago

The application code would be able to open and close "ranges" inside a task, and associate a name with each range. Nesting of ranges would be supported.

task foo() {
  // do stuff
  push "part 1"
  // do stuff
  push "sub-part 1a"
  // do stuff
  pop
  // do stuff
  pop
  // do stuff
}

In the profile, these ranges would be overlaid directly on top of the parent task's box (like a wait interval is visualized today):

overay-profiler-boxes

Note that a wait can occur while we have pushed a user range, so on the profile we would go to a "waiting" state, then back into the last used color.

This feature would be useful for debugging mapper performance (we would mark on the profile which part of a mapper call task is executing user code, and which is doing runtime work), and for apps to visualize how much time certain parts of a task are taking.

On the profiler side, this could be done as a generalization of the existing TaskWaiter infrastructure; instead of having a fixed sequence of points (created, ready, start, stop), we could generalize this to any number of intervals (of which the default happen to be running/blocked/ready).

Note that we can't actually paint over rectangles in the new UI. So at some point in the pipeline we need to slice things into non-overlapping segments.

lightsighter commented 1 year ago

You want this ability just inside the mapper or also in the application? The mapper is probably straight-forward since there's no deferred execution in the mapper, but if you want it in the application that will be a bit more work.

eddy16112 commented 1 year ago

Does deferred execution matters? If we have the following code:

task foo() {
  // do stuff
  push green
  // do stuff
  launch task B
  pop
  // do stuff
}

The launch task B is a deferred execution, and the actual task B will be visualized somewhere else but not inside task foo.

lightsighter commented 1 year ago

It depends entirely on what "do stuff" is. If "do stuff" is perform computation in this task then no. If "do stuff" is launch some sub-tasks, then it matters quite a lot.

manopapad commented 1 year ago

My usecase is for the mapper, but @rohany expressed interest in having user-controlled ranges in application tasks (I'll let him explain his usecase more), so I made the ask a bit more general.

I was assuming that the range-enter / range-exit annotations would not consider deferred execution. I.e. all that the runtime would need to do is dump a marker at the point when control reaches a "push" or "pop" point, regardless of any asynchronous work that had been launched prior to that point. If the code did this:

enter task
push "range 1"
launch sub-task
pop
wait on sub-task

then the range on the profile would only include the launching time, not the waiting. This is sufficient for profiling mapper calls, but maybe @rohany had different requirements?

eddy16112 commented 1 year ago

I agree with @manopapad , the range will only include the launching time, and the wait will be shown as WaitInterval

rohany commented 1 year ago

My usecase is for the mapper, but @rohany expressed interest in having user-controlled ranges in application tasks (I'll let him explain his usecase more), so I made the ask a bit more general.

I think having it in the application would be quite beneficial to many users. The use case I'm imagining was when thinking about adding some more sophisticated fusing / partitioning logic to legate. In the case where we batch many tasks up together and try to fuse or bulk partitioning things, it would be nice to demarcate the ranges that we actually spend thinking about this, to understand the overheads involved. I suspect other applications that do non-trivial reasoning in top-level tasks would also be interested in this. What I don't think we should do is try to support something like this at small granularities to help profile hot loops or something in compute task bodies, as that's something better left for other benchmarking tools.

lightsighter commented 1 year ago

I suspect everyone is going to want what @rohany is asking for, but that will take more time to implement, but I'd rather do that than something just hacks in the range timing for launching tasks. If we're just going to support this in the mapper to start though then that is an obvious thing to do since there is no deferred execution there.

manopapad commented 1 year ago

demarcate the ranges that we actually spend thinking about this

It didn't sound to me like Rohan is asking for anything different than I did. He also just wants to show on the profile a range of time spent doing computation in the task, which includes just time in the task, and doesn't recursively include any time spent in launched sub-tasks.

MadFunMaker commented 1 year ago

Another useful case for application-level support we found is to analyze top-level tasks. With the current profiler, the users only see a single big top-level task running as follows:

big-top-level

But, with this functionality, users can see more fine-grained tasks within a single big top-level task. This will allow us to investigate issues with top-level tasks, e.g., #1448.

manopapad commented 12 months ago

An alternative visualization would be to "stack" the different ranges:

overay-profiler-boxes-2

We could possibly do this "on demand", when the user hovers/clicks on a box (otherwise we only show the top parent box).