cardwing / Codes-for-Lane-Detection

Learning Lightweight Lane Detection CNNs by Self Attention Distillation (ICCV 2019)
MIT License
1.04k stars 333 forks source link

Runtime of ERFNet-CULane-PyTorch #171

Closed Yibin122 closed 5 years ago

Yibin122 commented 5 years ago

After several experiments, I found the forward pass is very fast and only takes less than 10ms on my machine with 1060 Ti. But copying data from gpu to cpu is really time consuming, which costs more than 10ms for just 1x5x208x976 tensor: https://github.com/cardwing/Codes-for-Lane-Detection/blob/master/ERFNet-CULane-PyTorch/test_erfnet.py#L114

I am pretty new to PyTorch and getting confused about this basic operation. Is it the same on your side? Can you share your observation?

Thanks!

cardwing commented 5 years ago

You need to add torch.cuda.synchronize().

Yibin122 commented 5 years ago

Got it!