Closed Yibin122 closed 5 years ago
After several experiments, I found the forward pass is very fast and only takes less than 10ms on my machine with 1060 Ti. But copying data from gpu to cpu is really time consuming, which costs more than 10ms for just 1x5x208x976 tensor: https://github.com/cardwing/Codes-for-Lane-Detection/blob/master/ERFNet-CULane-PyTorch/test_erfnet.py#L114
I am pretty new to PyTorch and getting confused about this basic operation. Is it the same on your side? Can you share your observation?
Thanks!
You need to add torch.cuda.synchronize().
torch.cuda.synchronize()
Got it!
After several experiments, I found the forward pass is very fast and only takes less than 10ms on my machine with 1060 Ti. But copying data from gpu to cpu is really time consuming, which costs more than 10ms for just 1x5x208x976 tensor: https://github.com/cardwing/Codes-for-Lane-Detection/blob/master/ERFNet-CULane-PyTorch/test_erfnet.py#L114
I am pretty new to PyTorch and getting confused about this basic operation. Is it the same on your side? Can you share your observation?
Thanks!