Closed wondervictor closed 4 years ago
@wondervictor Hi !Thanks for your interest. The current config is wrong since the output stride=8. Also, the three fpn dsn heads are contained which lower down the speed. @donnyyou . I test the speed with os=32 and no fpn dsn heads. It can reach 17 fps above depend on your device without the TensorRT speed up.
Thank you(@lxtGH) very much for your reply. I wonder whether the performance will drop with os=32 and how to change the output stride of SFNet which has an FPN-style structure.
@wondervictor Hi ! I think the main advantage of our work is to solve the gap beween os=32 and os=8 in accuracy while keeping fast.
@lxtGH Can I change the os=32 in this line https://github.com/donnyyou/torchcv/blob/98c7299411943ae66d7be64a8103bf61e0d9b17a/model/seg/nets/sfnet.py#L124
Hi @lxtGH, I've found the devil. There is a considerable latency(about 0.08s) in the DataParallel
when testing with the current implementation using a single GPU. After unwrapping the net from the DataParallel
, the inference time drops to 0.0607s per image on RTX 2080Ti.
Thanks for your help!
OK Thanks for your remind. I don't know DataParallel will lead runtime decay.
Hi @lxtGH, I've found the devil. There is a considerable latency(about 0.08s) in the
DataParallel
when testing with the current implementation using a single GPU. After unwrapping the net from theDataParallel
, the inference time drops to 0.0607s per image on RTX 2080Ti. Thanks for your help!
Hi @wondervictor, I don't know PyTorch too well. How were you able to unwrap the net from DataParallel?
Hi, thanks for your great work and contribution. I'm interested in your work
SFNet
and trained a model (SFNet with ResNet18) and achieved a comparable validation result. However, I'm confused about how to reach the inference speed presented in your paper (18/26FPS) with this codebase. I inserted a timer tosegmentation_test.py
and obtain the inference speed of SFNet (single scale, ResNet18): 0.143s/image, which is much slower than that mentioned in the paper. Could you provide some clues to obtain the inference speed presented in the paper? (without TensorRT)