Closed tjulyz closed 5 years ago
I think we should make fair comparison by running with"CUDA_LAUNCH_BLOCKING=1" before the command. Then the speeds can be fairly compared, because PyTorch has default asynchronous mechanism.
Sorry, when I add this "CUDA_LAUNCH_BLOCKING=1", the speed becomes really slow... Have you tested your speed with this?
This command can fairly compare speed of detectors due to that the speed computing program should be synchronized with cpu and gpu. To speed up the detector, we didn’t use it. Also we speed up the detector by saving post-process time through multiprocessing. Then the total time becomes only little longer than the cnn time.
So, could you please show the details of inference time of CNN and nms time of your network?
I plan to list speeds of some one-stage detectors by using CUDA_LAUNCH_BLOCKING=1 next month. (including ssd, dssd, retinanet, refinedet, cornernet and m2det).
That is really great! Sometimes I find the nms time really has large influence to the final speed. Such as SSD, it is really fast for network inference, but slow for nms. So I think it could be better to compare the speed both with detailed nms time and network time. I am just curious about how long it takes for network inference by M2Det. Refinedet takes 9ms for network and 6ms for nms (I can only got 12ms+ for nms).
Great! Thanks!
Hi, Great work! I have a question about the test speed. As RFBNet reported, they achieve 15ms per image, and SSD get 22ms., while speed of m2det is nearly 30ms. Have you check the speed? Using pytorch 0.4.0+, I cannot get that speed. In your paper, SSD is 43 fps. Is it tested on pytorch 0.4.1 as yours? Thank you very much!