A possible solution to give gpu time duration statistics

The repo is very useful to me, and thank you the author. I just viewed the code and would like to give a possible solution to output gpu time duration, but my solution is just my idea myself and I'm not completely sure this gives precise result. I hope those who have a better idea would like to share it. And actually torch.autograd.profiler.profile can also make it although the op names are hard to recognize and duration percentages are not output.

change the code x = torch.rand(1, *self._input_size) # add module duration time

(from https://github.com/Swall0w/torchstat/blob/master/torchstat/model_hook.py#L22) to x = torch.rand(1, *self._input_size).cuda() # add module duration time

change the code

start = time.time()
output = self._origin_call[module.__class__](module, *input, **kwargs)
end = time.time()

(from https://github.com/Swall0w/torchstat/blob/master/torchstat/model_hook.py#L49) to

torch.cuda.synchronize()
start = time.time()
output = self._origin_call[module.__class__](module, *input, **kwargs)
torch.cuda.synchronize()
end = time.time()

(remember to import torch in that file)

By doing all above, I test the result, and found that if all the model is run at the first time, much of the total time would be consumed by the first operator, which I guess would be for the memory copy time from cpu to gpu, and if the code is run twice in the same .py file, the result for the second time looks fine. I hope anyone who'd like to share his or her own opinion to say something:)

Swall0w / torchstat

A possible solution to give gpu time duration statistics #7