MendelXu / ANN

semantic segmentation,pytorch,non-local
Apache License 2.0
312 stars 63 forks source link

tool for computation and memory statistics #4

Closed yassouali closed 4 years ago

yassouali commented 4 years ago

First of all thank you for this wonderful work, and for providing the implementation.

Feel free to close this issue if this is not the place to ask such questions. I really liked your computation and memory statistics, I was wandering what tool you've used to get such statistics.

Thank you very much

MendelXu commented 4 years ago

Please refer to here for all details.

  1. For memory statistics, we use torch.cuda.max_memory_allocated() . This function always returns the maximum GPU Memory during its lifetime, so we have to organize all modules we want to count in the increasing order of memories, make sure that the result is the actual result of the current module.
  2. For GFlops statics, we use torchstat. However, as some operations are not included in the original repository, we add those, such as MatMul,AdaptiveAvgPool.
  3. For total time statics, we just compute the time of forwarding.Also, to get a more accurate result,we have to call torch.cuda.synchronize() before computing time and test it several rounds to get the average time.
  4. For time details of each operation, we use torch.autograd.profiler.profile().

I hope it's clear for you.

yassouali commented 4 years ago

Thank you very much for your response.