Tramac / Fast-SCNN-pytorch

A PyTorch Implementation of Fast-SCNN: Fast Semantic Segmentation Network
Apache License 2.0
379 stars 93 forks source link

Training is too slow #28

Open wangq95 opened 4 years ago

wangq95 commented 4 years ago

Hi, @Tramac , I use your code with default setting, and find the speed of training is too slow. The usage of GPU is less than 10%, resulting in about 130 seconds to train a mini-batch. How to solve this problem?

Bidski commented 3 years ago

I am also encountering this issue

SarBH commented 3 years ago

Same issue. To debug, I ran timing on each portion of training, I was able to identify the source of the largest delay: The computation of the SoftmaxCrossEntropyOHEMLoss occurs on CPU: https://github.com/Tramac/Fast-SCNN-pytorch/blob/0638517d359ae1664a27dfb2cd1780a40a06c465/utils/loss.py#L60

On my workstation and my training data, the training is taking ~1.1 seconds/image. The execution of the forward method in class SoftmaxCrossEntropyOHEMLoss(nn.Module) alone accounts for ~60% of that duration.

Has anyone come up with a more efficient implementation of this OHEM loss? Possibly translating the ops from numpy to torch to be able to run on GPU?