apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Memory profiling enhancements #14686

Open anirudh2290 opened 5 years ago

anirudh2290 commented 5 years ago

I have created prototype for visualizing the memory pools on the gpu. I have added a doc explaning the feature and how to use the prototype in the cwiki: https://cwiki.apache.org/confluence/display/MXNET/MXNet+Memory+Profiling+Enhancements

I would need some help making this prototype ready to be PR'ed.

There are more improvements that can be done as mentioned in the cwiki. Listing some of them here:

  1. Support for visualizing cuDNN memory allocation and frees
  2. Better visualization for CPU memory pools
  3. Support for MKLDNN Memory allocation
  4. Parameter server, server and worker memory visualization.

Let me know if interested.

mxnet-label-bot commented 5 years ago

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: Feature

ChaiBapchya commented 5 years ago

Thanks for this feature idea. Interested to work. Will post a PR once I get this working.

anirudh2290 commented 5 years ago

Thanks @ChaiBapchya . I think it requires some additional testing. Also, need to do some sanity performance tests. Need to add a switch to toggle it in the profiler API. Also, it would be nice to test it in distributed training setting though not strictly required.

eric-haibin-lin commented 5 years ago

The system team from UofT developed https://github.com/tbd-ai/tbd-tools which profiles memory footprint http://www.sysml.cc/doc/2019/demo_24.pdf for MXNet @SerailHydra @olympian94 @izaakniksan @ArmageddonKnight If possible we can reuse and avoid duplicating work

anirudh2290 commented 5 years ago

Thanks @eric-haibin-lin for the pointer! Will take a look

SerailHydra commented 5 years ago

Hi, all

Thanks for your interests in our memory tools! I started to build this tool for benchmarking purpose and the version is only 0.11.0. Later my colleagues use the same techniques to build it on new versions of MXNet for optimization purpose.

The open-sourced one is on a bit old version, I am not sure how helpful it is since the codebase changed a lot. I think @ArmageddonKnight has the memory profiling tool for a newer version. If you need some input from us in person, we would be happy to help have the tool integrated to main branch.