Open lupotto opened 5 years ago
Hi @eric612,
My main problem is when I want to test the speed to the network using the demo. My detections are around 1sec per image. I tried to detect my images with mobilenet-yolov3 using the coco weights that you provide and is around 1 sec per image. Also I tried with my caffe model trained with my own dataset and has similar performance. I tried with mobilenet-yolov3-lite and it's much faster (~20ms). Do you know what should I do?
Can you share your demo script , I think it is gpu/cpu mode problem
I have it on my office. But I used the exact same as demo_yolo_lite.sh. I just changed the deploy.prototxt and .caffemodel. Moreover, I checked with watch nvidia-smi
if my GPU was working.
When I was running the script one caffe process was using my GPU memory. I don't understand why is working well with demo_yolo_lite but not with demo_yolov3.
My lite version caffemodel was batchnorm merged . It's may speed up 2x via non merged model. And mobilenet-yolov3 was slower than lite version 4x speed , because of different input resolution.
So I think your model inference time will be near to 160 ms , maybe I need check the logs , and if have prototxt and caffemodel would be better , I can try it on my computer.
I can share it with you as soon as I get to my office. Could you please give me your e-mail?
Thank you so much!
I send it to your gmail , please check it
I didnt recieve it yet. Could you send it to lupotto46@gmail.com?
Ok , I resend it again
@eric612 I'm very interested in the BN absorbing. Could you please share that code? Thanks!
@NEU-Gou Sorry that I didn't say it clearly , the speed was not 2x , here was my test
Network input resolution 416 , GTX 1080 , this project test yolov3-lite-bn and bn-merged
Actually was not 2x speed , but at least 1.5x
@lupotto , As I see , you spend too much time consuming convolution at 1/8 scale , like conv19~21 , If it is necessary , I suggest decrease channel number less than 64 , or use bottleneck architecture
@NEU-Gou Sorry that I didn't say it clearly , the speed was not 2x , here was my test
Network input resolution 416 , GTX 1080 , this project test yolov3-lite-bn and bn-merged
- bn-merged : first image cost 15 ms , and others cost 7 ms
- yolov3-lite-bn : first image cost 29 ms , and others cost 11 ms
Actually was not 2x speed , but at least 1.5x
@eric612 Thanks for the clarification. The speed improvement is very impressive. Is it possible to share the BN absorb tool you're using? Thanks!
@NEU-Gou , I just modify code from mobilenet-ssd , and let it can automatically produce prototxt , as I see , I don't have any contribution about inference speed
@eric612 ,
Thanks for the suggestion. It is finally solved the speed problem. Thanks!
@eric612 Thanks for pointing me the right code. I will give it a try.
You can see this issue