donnyyou / torchcv

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision
https://pytorchcv.com
Apache License 2.0
2.25k stars 378 forks source link

Question about the inference speed of SFNet #156

Closed wondervictor closed 4 years ago

wondervictor commented 4 years ago

Hi, thanks for your great work and contribution. I'm interested in your work SFNet and trained a model (SFNet with ResNet18) and achieved a comparable validation result. However, I'm confused about how to reach the inference speed presented in your paper (18/26FPS) with this codebase. I inserted a timer to segmentation_test.py and obtain the inference speed of SFNet (single scale, ResNet18): 0.143s/image, which is much slower than that mentioned in the paper. Could you provide some clues to obtain the inference speed presented in the paper? (without TensorRT)

lxtGH commented 4 years ago

@wondervictor Hi !Thanks for your interest. The current config is wrong since the output stride=8. Also, the three fpn dsn heads are contained which lower down the speed. @donnyyou . I test the speed with os=32 and no fpn dsn heads. It can reach 17 fps above depend on your device without the TensorRT speed up.

wondervictor commented 4 years ago

Thank you(@lxtGH) very much for your reply. I wonder whether the performance will drop with os=32 and how to change the output stride of SFNet which has an FPN-style structure.

lxtGH commented 4 years ago

@wondervictor Hi ! I think the main advantage of our work is to solve the gap beween os=32 and os=8 in accuracy while keeping fast.

wondervictor commented 4 years ago

@lxtGH Can I change the os=32 in this line https://github.com/donnyyou/torchcv/blob/98c7299411943ae66d7be64a8103bf61e0d9b17a/model/seg/nets/sfnet.py#L124

wondervictor commented 4 years ago

Hi @lxtGH, I've found the devil. There is a considerable latency(about 0.08s) in the DataParallel when testing with the current implementation using a single GPU. After unwrapping the net from the DataParallel, the inference time drops to 0.0607s per image on RTX 2080Ti. Thanks for your help!

lxtGH commented 4 years ago

OK Thanks for your remind. I don't know DataParallel will lead runtime decay.

pauls3 commented 2 years ago

Hi @lxtGH, I've found the devil. There is a considerable latency(about 0.08s) in the DataParallel when testing with the current implementation using a single GPU. After unwrapping the net from the DataParallel, the inference time drops to 0.0607s per image on RTX 2080Ti. Thanks for your help!

Hi @wondervictor, I don't know PyTorch too well. How were you able to unwrap the net from DataParallel?