StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
686 stars 144 forks source link

Realm: Barrier profiling #1742

Open lightsighter opened 3 months ago

lightsighter commented 3 months ago

In the process of adding critical path analysis support in Legion, it's become apparent that barriers are very difficult to profile in a scalable way. With critical path support, Legion currently supports two ways of profiling barriers:

  1. Instead of arriving on barriers directly if they have a precondition event, we use a no-op task with the same precondition as the arrival to measure the time that the precondition is ready by requesting an operation timeline profiling response and we look at the ready time for the no-op task to determine when the arrival would have been performed. Then we use the reduction feature of Realm's barriers to perform a reduction to determine the last arrival and it's meta data for logging on all the nodes that subscribe to the barrier. This is the more scalable approach as it ensures that if the barrier is on the critical path we can tell users exactly which node to load next along the critical path without needing to analyze all the logs, but comes with the drawback that we need to add the reduction into the barriers as part of the execution which may slightly slow down execution.
  2. We simply log all the arrivals on all the nodes and allow Legion Prof to perform an offline analysis to determine the last arrival. Users use the -lg:prof_all_critical_arrivals flag on the command line to opt for this version. This approach has the benefit of incurring the least amount of overhead when profiling and doing the most analysis offline, but it is prone to running out of memory on larger runs where we need to load lots of logs upfront to be able to do the analysis.

It would be good if Realm could provide support for barrier profiling. In particular it would be good to know the following information:

  1. When the last arrival occurred
  2. If there was an event precondition for that arrival what it was
  3. The finish event of the realm operation that performed the last arrival.
  4. When the barrier triggered on the owner node
  5. When the subscription notification made it to this node where we're doing the profiling

One thing that will be hard about this is defining a model for profiling responses that allows users to request it in a way that allows them to control how many profiling responses they get and on which nodes. I suspect the following model might be a good one:

  1. Users have to opt-in to barrier profiling when they create the barrier. Just opting in to profiling though does not generate any profiling responses anywhere.
  2. Users can request a profiling response for a specific barrier generation of the barrier at any time on any node and each request will produce one profiling response with the above information once that generation of the barrier has triggered. It will be an error to request a profiling response if the barrier was not configured to support profiling.

Assigning to @apryakhin for triaging and delegation.

apryakhin commented 3 months ago

This approach has the benefit of incurring the least amount of overhead when profiling and doing the most analysis offline, but it is prone to running out of memory on larger runs where we need to load lots of logs upfront to be able to do the analysis.

Just to be clear, the OOM we are getting is because of the profiler and not because of storing/grabbing profiling info from the reduction results?

apryakhin commented 3 months ago

It would be good if Realm could provide support for barrier profiling. In particular it would be good to know the following information:

I did a crude attempt to get some of this information while manually profiling barriers a while ago and agree that we should probably have an "automated" way for that. Unclear at which point we are going to want "the profiling support" given that you already have a solution that gets you to a certain point. However, considering that we have an implementation to scale barrier's arrivals/broadcast with p2p active messages...perhaps we may want to have the profiling support for that first before we move forwards with it.

eddy16112 commented 3 months ago

Users can request a profiling response for a specific barrier generation of the barrier at any time on any node

What if the barrier on other node has already passed the generation when we request the profiling response on our node?

lightsighter commented 3 months ago

Just to be clear, the OOM we are getting is because of the profiler and not because of storing/grabbing profiling info from the reduction results?

The OOM is occurring during post-processing of the logfiles by Legion Prof and not during the execution of the program. The problem is that the size of the graph needed to represent the Realm event graph is too big to fit in memory.

Unclear at which point we are going to want "the profiling support" given that you already have a solution that gets you to a certain point.

Right, I have a work-around for now which relies on the barrier reduction mechanism.

perhaps we may want to have the profiling support for that first before we move forwards with it.

I would actually probably prefer that we get that done first and then maybe add this barrier profiling support on top of that once it is ready, especially since we already have a work-around for the moment (assuming the work-around continues to work at scale).

What if the barrier on other node has already passed the generation when we request the profiling response on our node?

I'm assuming that the implementation will store the profiling responses for all the generations indefinitely, similar to how it stores the reduction results of the barrier indefinitely. Yes this is inefficient, but it's something the user opts into with an understanding of the costs, similar to how they opt into using a reduction operator with a barrier.

apryakhin commented 1 month ago

@eddy16112