Open wangq95 opened 4 years ago
I am also encountering this issue
Same issue. To debug, I ran timing on each portion of training, I was able to identify the source of the largest delay: The computation of the SoftmaxCrossEntropyOHEMLoss occurs on CPU: https://github.com/Tramac/Fast-SCNN-pytorch/blob/0638517d359ae1664a27dfb2cd1780a40a06c465/utils/loss.py#L60
On my workstation and my training data, the training is taking ~1.1 seconds/image. The execution of the forward
method in class SoftmaxCrossEntropyOHEMLoss(nn.Module)
alone accounts for ~60% of that duration.
Has anyone come up with a more efficient implementation of this OHEM loss? Possibly translating the ops from numpy to torch to be able to run on GPU?
Hi, @Tramac , I use your code with default setting, and find the speed of training is too slow. The usage of GPU is less than 10%, resulting in about 130 seconds to train a mini-batch. How to solve this problem?