[EPIC] Realm profiling - Githubissues

apryakhin commented 1 month ago

We currently do not have a dedicated way to profile Realm. At this stage, the primary "client" of Realm is Legion, which uses its own custom profiling solution. For analyzing Realm, we rely on the existing logging infrastructure, which may not always be the best approach, particularly when dealing with complex applications.

This is an umbrella issue where I propose addressing the following questions: What specific information are we missing today? And, in conjunction with this, what data is the current Legion profiler lacking? Does it provide all the insights needed to fully understand what's happening within Realm?

If we reach a consensus on the need for a custom Realm profiler, we can document the requirements and assess existing solutions to determine if any are suitable—for instance, Tracy. Additionally, we've made an initial attempt to integrate Realm with Nsight Profiler by adding a library to inject NVTX tags. We need to decide whether we want to continue developing this integration. If so, will it be sufficient for all users? If not, should we maintain it alongside another profiling solution? There are a number of older attempts scattered across different branches..for example bgwork profiler or unmaintained profiling infrastructure that also needs to be looked into.

On a related note, would it be worth considering the development of a generic profiling layer within Realm that could provide "adapters" for whichever solution we choose to support (if we choose to support any)?

cc @muraj @elliottslaughter @manopapad @lightsighter @eddy16112 @magnatelee

manopapad commented 1 month ago

IMHO it is definitely desirable to have Realm dump pre-digested profiling information (replacing / in addition to profiling callbacks), for visualization in legion-prof, to support the profiling of pure-Realm applications. AFAIU most of the concepts in the profiler should apply to Realm (with some exceptions, e.g. critical path information).

elliottslaughter commented 1 month ago

A couple thoughts:

To the extent that any Realm information is currently missing in Legion Prof, I think that should be the highest priority. I have no issue with you integrating any other solutions that you want, but Legion Prof is the only system that presents high-quality information about Legion end-to-end. To the extent that there are parts of Realm that we cannot see into at the moment, it would provide a lot more value to have those all in one place rather than finding a separate solution which we then need use in combination with Legion Prof to get value for most users.
I realize that doesn't give you a dedicated profiling solution for Realm. It's at least conceivable that you could generate Legion Prof format logs, either by lowering some of the logging infrastructure into Realm, or doing a standalone bare bones implementation yourselves. The model is not that complicated; the majority of the complication comes from the richness of Legion's programming model (which Realm is not obligated to use). Anyway, I'll just mention that as an option.
For whatever it's worth, Legion Prof already provides multiple backends, but we've found that the others are often lackluster. E.g. we found that Google Trace Viewer (used for TensorFlow) simply will not load profiles beyond a certain size. The new NVTXW integration for Legion Prof is very basic and doesn't provide the full fidelity of information available to Legion Prof, nor do I think it will do so any time soon (as the models are too different). While Realm is lower-level than Legion, it seems to me that Realm has more in common with Legion than it does with, say, MPI, so a lot of pre-existing solutions may not work that well for Realm.

lightsighter commented 1 month ago

Does it provide all the insights needed to fully understand what's happening within Realm?

The answer to this is definitely a 'no'.

What specific information are we missing today? And, in conjunction with this, what data is the current Legion profiler lacking?

I think the main information that we are missing is:

When are the background workers actually busy (not just spinning/polling)?
When they are busy, what are they busy doing (e.g., what kinds of background work items)?
When is the network "congested", e.g. GASNet/UCX are having trouble putting messages on the wire?
When the network is congested, what are the breakdowns of messages and where are they going? Is it only specific endpoints that are congested or is it all of them?

On a related note, would it be worth considering the development of a generic profiling layer within Realm that could provide "adapters" for whichever solution we choose to support (if we choose to support any)?

I think at a minimum, this information should be somehow available through Realm's existing ProfilingResponse interface so that clients can record it however they like (including Legion). If we want to build separate profiling infrastructure and stuff that is fine too, but profiling needs to continue to be dynamically available at runtime, and come back through the profiling response interface.

I realize that doesn't give you a dedicated profiling solution for Realm. It's at least conceivable that you could generate Legion Prof format logs, either by lowering some of the logging infrastructure into Realm, or doing a standalone bare bones implementation yourselves. The model is not that complicated; the majority of the complication comes from the richness of Legion's programming model (which Realm is not obligated to use). Anyway, I'll just mention that as an option.

I agree that I think Legion Prof can be used as a visualizer for Realm programs. I think most of the relationships that Legion describes today could even be inferred from general Realm programs (see below). I think if you ignore the Legion-specific logging statements then Legion Prof should be able to be used as a profiler for general Realm applications. If Legion Prof doesn't work just rendering Realm-only programs then we should do what we need to do in order to fix that.

AFAIU most of the concepts in the profiler should apply to Realm (with some exceptions, e.g. critical path information)

Actually the critical path stuff that is there now should work on generic Realm applications too. You'll need to do some extra logging like for every Processor::spawn call recording the Processor:get_current_finish_event of the task that called Processor::spawn. Independently, you'd need to record every instance that has an accessor made by a particular task (also associated with the Processor::get_current_finish_event).

syamajala commented 1 month ago

The network profiling stuff would probably be super helpful for the slingshot-11 issues we're seeing.

I would personally still like some sort of optional "live mode" for profiling realm as mentioned in #1607 as we very frequently have no information at all when we are trying to debug hangs and freezes at the scale of 2048, 4096, and 8192 nodes. Its very frustrating.

lightsighter commented 1 month ago

I would personally still like some sort of optional "live mode" for profiling realm as mentioned in https://github.com/StanfordLegion/legion/issues/1607 as we very frequently have no information at all when we are trying to debug hangs and freezes at the scale of 2048, 4096, and 8192 nodes

I don't think this is the right place to have that discussion. (Probably deserves its own issue because of the complexity involved; I don't even know how you would make that work since a lot of what we need to do involves needing the whole profile logs.) It does motivate the need for Realm to continue to make all of its profiling results available dynamically through the profiling response interface though.

apryakhin commented 4 weeks ago

I realize that doesn't give you a dedicated profiling solution for Realm. It's at least conceivable that you could generate Legion Prof format logs, either by lowering some of the logging infrastructure into Realm, or doing a standalone bare bones implementation yourselves.

Using Legion Prof to profile Realm does sound reasonable given two have a lot in common and significant effort has been put into developing Legion Prof. However, I am not convinced about the effort it may take. @elliottslaughter Is there a good pointer you can provide about the Legion Prof logging formats and some details how it's all being stored right now?

eddy16112 commented 4 weeks ago

Using Legion Prof to profile Realm does sound reasonable given two have a lot in common and significant effort has been put into developing Legion Prof.

That means we need to extend the realm profiling API for these internal profiling data such as active message, bgwork, etc. There could be two issues:

Sending these data to the requester (the node where the profiling task is executed) is costly. These data could be much bigger than the one provided by the current profiling API we have today.
Profiling API is user-level API, which means realm applications have to do whatever Legion does today: call profiling API, collective data into memory, stream data into files.

I think for realm applications, we need to directly dump the profiling data into files (in legion prof, nvtx, or whatever formats), then users can turn on the profiling easily just like how we enable legion profiling.

apryakhin commented 4 weeks ago

I think for realm applications, we need to directly dump the profiling data into files

This is exactly what I was talking about by asking about the format and existing infrastructure that is used to manage the underlying storage.

elliottslaughter commented 4 weeks ago

The questions about generating the profiling format are for @lightsighter, as this is all code that resides in Legion at the moment. I suppose it's possible that code could be factored out to make it generally available to other users. (I don't personally think it's worth duplicating since the format is fairly specific.)

eddy16112 commented 4 weeks ago

@elliottslaughter I do not think the format @apryakhin mentioned is the Legion format. If I understand it correctly, we will by pass the Legion and let realm dump its profiling into files that can be read by Legion prof viewer directly.

elliottslaughter commented 4 weeks ago

That's the format that legion_prof reads....

If you want to know how to write to that format, there is exactly one place we currently do that: in the Legion code.

If you want to see the parser side code, you can look at Rust, but I think it's more informative to read the serializer code that actually writes the data (which also happens to be in C++).

If you're talking about the archive format produced by legion_prof archive, then no, I do not recommend generating that. Full stop. There is a vast amount of business logic going into the process of generating that format, which is exactly why I don't recommend that anyone read the Rust State directly. And similarly, writing to that format would mean duplicating all that logic.

eddy16112 commented 4 weeks ago

I feel like we might need to deal with the archive format if we pick legion prof viewer, because the data generated by the current realm profiling API does not 100% align with the legion prof, e.g. we do not have the notion of meta task, mapper call in realm, so we can not directly use the rust serializer to read realm data.

elliottslaughter commented 4 weeks ago

@lightsighter addressed this here: https://github.com/StanfordLegion/legion/issues/1777#issuecomment-2423382208

I agree that I think Legion Prof can be used as a visualizer for Realm programs. I think most of the relationships that Legion describes today could even be inferred from general Realm programs (see below). I think if you ignore the Legion-specific logging statements then Legion Prof should be able to be used as a profiler for general Realm applications. If Legion Prof doesn't work just rendering Realm-only programs then we should do what we need to do in order to fix that.

Mike and I are in agreement on this. And like I said above, a lot of the value-add for Legion Prof is in the processing stage, so you're really missing out (and duplicating work) is if you skip that. The UI is nice, and has a lot of relevant usability features (besides being very scalable), but it's fundamentally a dumb viewer. It just shows what you tell it to show. All the fancy business logic is in the core legion_prof and the analysis required to generate the dump in the first place.

lightsighter commented 4 weeks ago

The Legion Prof logging format is dirt simple. There's just a bunch of structures for various kinds of logging statements, those get dumped into a zlib file, and then Legion Prof parses each of them on the other side. For a stand-alone Realm application you should be able to completely ignore all the Legion-specific logging statements (which are a minority of the statements) and just use the general Realm logging statements and Legion Prof should still be able to render things. If there are exceptions to that @elliottslaughter or I will fix them. I'm even willing to build the library on top of Realm's existing interface to do this for generic Realm programs myself if it's not obvious how to do it.

I still think the most important problem we should be discussing here is how to get the data for the four questions I asked in my previous comment exposed through the Realm profiling response interface. That is going to be the hard thing to figure out. Figuring out how to render the data will be easy in comparison.

eddy16112 commented 4 weeks ago

I still think the most important problem we should be discussing here is how to get the data for the four questions I asked in my https://github.com/StanfordLegion/legion/issues/1777#issuecomment-2423382208 exposed through the Realm profiling response interface.

I think 1 and 2 is not difficulty to get. I do not have an answer for 3 and 4, because I am not quite familiar with it. However, I am not sure if we want to expose them via the realm profiling API. For the bgwork, if we create a profiling response every time we pick up a bgwork item, there might be too many of them. Actually, I do not think realm users want to use profiling API to get such internal profiling data, e.g. the bgwork, because there is even no public API for bgwork. Network might be a different story, because we may attach those network data into realm copies.

As I said in my previous comment https://github.com/StanfordLegion/legion/issues/1777#issuecomment-2436478763, I do not think realm profiling API is the right way for realm applications. I think the current realm Profiling API is more like CUPTI, where applications can use it to get online profiling data, but if CUDA users do not use CUPTI, they can still use nsight to profile CUDA applications.

I'm even willing to build the library on top of Realm's existing interface to do this for generic Realm programs myself if it's not obvious how to do it.

I am not sure if I understand it. Realm applications do not have to use the Profiling API, but just run with your library?

lightsighter commented 4 weeks ago

For the bgwork, if we create a profiling response every time we pick up a bgwork item, there might be too many of them. Actually, I do not think realm users want to use profiling API to get such internal profiling data, e.g. the bgwork, because there is even no public API for bgwork.

I'm suggesting that we make such a public API for profiling bgwork. I agree that it can't give a response for every single bgwork item as that will be overwhelming, hence the reason that designing this well will be challenging.

I do not think realm profiling API is the right way for realm applications. I think the current realm Profiling API is more like CUPTI, where applications can use it to get online profiling data, but if CUDA users do not use CUPTI, they can still use nsight to profile CUDA applications.

I think all Realm profiling data should be available dynamically. @syamajala made a prescient comment about potentially wanting to do online rendering of the profile. This isn't something that Legion Prof supports today, but is something that would be good to support in the near future. Legion mappers might also want dynamic information about what's going on with the background worker threads as well. I think we should always support dynamic online profiling solutions and that will also enable offline profiling as well.

I am not sure if I understand it. Realm applications do not have to use the Profiling API, but just run with your library?

Yes, the library will provide drop-in replacements for many Realm API calls and the user will use those instead of canonical Realm API calls. The library will then capture all the needed data and add necessary profiling requests to get all the information that it needs and log it out to zlib files. It would be really nice if Realm had hooks for API calls (like MPI has with PMPI) so we could easily capture this information, but I think we can work around with out it for now.

eddy16112 commented 4 weeks ago

I am OK with online profiling. I am just worried that if we want to expose so much internal data via online Profiling API, it will be challenge to keep the overhead as minimal as possible. We need to cache data into memory, and then send them to the node where the response task is executed.

Regarding the PMPI style hooks, I guess it won't work until we have a ABI stable API.

elliottslaughter commented 4 weeks ago

Maybe I'm missing something, but I believe the profiling API only responds to what you ask it for. So this would then be on Legion (or the standalone Realm profiling mode) to determine how much to ask for. We already have other tradeoffs in Legion where you can turn on modes that enable more profiling data to be collected.

Fundamentally, if you're chasing down an issue in Realm bgwork tasks, you need to see those in the profile. Right now we can't see them at all. I'm not sure whether we should make them visible by default, but having it as an option seems important in the long run.

lightsighter commented 3 weeks ago

I am OK with online profiling. I am just worried that if we want to expose so much internal data via online Profiling API, it will be challenge to keep the overhead as minimal as possible. We need to cache data into memory, and then send them to the node where the response task is executed.

Maybe I'm missing something, but I believe the profiling API only responds to what you ask it for

Both of these are touching on why doing this kind profiling is challenging. I think we'll need to define different "resolutions" of profiling for the client to ask for since some people will want all the data regardless of the cost and some other people will want to see a summary of the data in a "compressed" form that throws away some information but ensures that we log less data and minimize overheads to the actual application execution. We're going to need some knobs to turn to adjust the granularity of the profiling data we want to record, but it should be up to the user to declare what kind of resolution that they want.

Regarding the PMPI style hooks, I guess it won't work until we have a ABI stable API.

Understood. We don't need it right away. Would just be a nice feature to have eventually as it would make it possible to have this library be a completely drop-in replacement without any code changes.

apryakhin commented 3 weeks ago

When are the background workers actually busy (not just spinning/polling) ?

That has already been done by the Sean in the "custom bgwork profiler" branch

When they are busy, what are they busy doing (e.g., what kinds of background work items) ?

Fundamentally that's close to the item 4 (see below) in terms of how to handle this. We have access to all bgwork info and can collect/organize the data.

When is the network "congested", e.g. GASNet/UCX are having trouble putting messages on the wire?

I am pretty sure this is something GASNet/UCX know to determine now. Should be as simple as the number of outstanding requests submitted to the backend vs internal limit on the number of those outstanding requests? I don't see a problem getting this information unless I am not thinking broad enough about the problem statement.

When the network is congested, what are the breakdowns of messages and where are they going? Is it only specific endpoints that are congested or is it all of them?

This is one is harder but not infinitely hard. To do this type of break down we just to find a way to order and store all commit calls. The messages can probably be intercepted inside the active message layer. Perhaps needs some smart sampling strategy to filter only a subset of the most "important" once. Is there anything else I am missing missing? @SeyedMir I'm just speculating, but I'm surprised that neither UCX nor GASNET already offers stats on this. It seems pretty fundamental—maybe they do, and we just need to look more closely.

apryakhin commented 3 weeks ago

Okay couple of key points discussed so far just to summarize:

The profiling information should be "at least" available via ProfilingResponse which the interface that exists today already. That means whatever needs stats we add, we have to make sure those are reflected as part of this profiling interface. @lightsighter Is that accurate?
The recommendation is to go with Legion Prof as a viewer.
The recommendation is to make a library wrapping canonical realm interface that would add necessary profiling requests and dump them into the zlib files which essentially is the format that is consumed by Legion Prof today. Depending on the amount of information produced and required we are going to introduce a way to control the granularity of the profiling data

lightsighter commented 2 weeks ago

That means whatever needs stats we add, we have to make sure those are reflected as part of this profiling interface. @lightsighter Is that accurate?

Yes, that's correct. Whatever profiling data Realm collects, there should be a way to get at it dynamically through the profiling request interface.

The recommendation is to make a library wrapping canonical realm interface that would add necessary profiling requests and dump them into the zlib files which essentially is the format that is consumed by Legion Prof today. Depending on the amount of information produced and required we are going to introduce a way to control the granularity of the profiling data

I've developed a prototype version of this library here: https://gitlab.com/StanfordLegion/legion/-/merge_requests/1519 It wraps the necessary parts of Realm's interface for profiling and dumps out data in the format that Legion Prof understands. It works for the few Realm programs that I've tried it on so far. All you need to do is replace all instances of Realm:: with PRealm:: and #include realm.h with #include prealm.h as well as linking against the prealm library. I will provide a demo at a future Realm meeting.

StanfordLegion / legion

[EPIC] Realm profiling #1777