NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
10.13k stars 2.28k forks source link

[QUESTION] How to Obtain Computation Model Graphs in Megatron-LM? #832

Open fwyc0573 opened 4 months ago

fwyc0573 commented 4 months ago

Hi everyone,

I'm currently working on a project involving Megatron-LM and I'm looking for a way to obtain the graphs (computation graphs) of sub-models after partitioning, along with the attributes of the operators. I've tried using tools such as torch.fx and the new compile and dynamo tools in PyTorch 2.0, but I've encountered several issues. It seems that some of these problems are related to the custom operators implemented in Megatron-LM.

Could anyone provide a feasible solution or guidance on how to achieve this? Any help would be greatly appreciated!

Thank you in advance!

github-actions[bot] commented 2 months ago

Marking as stale. No activity in 60 days.

robotsp commented 2 months ago

I encountered a similar problem about tracing Megatron-LM transformer model to graph, using torch.fx as well. Some customized module is recognized as wrong type.. May I know your specific problem and how did you solve it? Thanks! @fwyc0573

github-actions[bot] commented 1 day ago

Marking as stale. No activity in 60 days.