eric612 / MobileNet-YOLO

A caffe implementation of MobileNet-YOLO detection network
Other
864 stars 442 forks source link

Speed problem mobilenet-yolov3 own dataset #60

Open lupotto opened 5 years ago

eric612 commented 5 years ago

You can see this issue

lupotto commented 5 years ago

Hi @eric612,

My main problem is when I want to test the speed to the network using the demo. My detections are around 1sec per image. I tried to detect my images with mobilenet-yolov3 using the coco weights that you provide and is around 1 sec per image. Also I tried with my caffe model trained with my own dataset and has similar performance. I tried with mobilenet-yolov3-lite and it's much faster (~20ms). Do you know what should I do?

eric612 commented 5 years ago

Can you share your demo script , I think it is gpu/cpu mode problem

lupotto commented 5 years ago

I have it on my office. But I used the exact same as demo_yolo_lite.sh. I just changed the deploy.prototxt and .caffemodel. Moreover, I checked with watch nvidia-smi if my GPU was working. When I was running the script one caffe process was using my GPU memory. I don't understand why is working well with demo_yolo_lite but not with demo_yolov3.

eric612 commented 5 years ago

My lite version caffemodel was batchnorm merged . It's may speed up 2x via non merged model. And mobilenet-yolov3 was slower than lite version 4x speed , because of different input resolution.

So I think your model inference time will be near to 160 ms , maybe I need check the logs , and if have prototxt and caffemodel would be better , I can try it on my computer.

lupotto commented 5 years ago

I can share it with you as soon as I get to my office. Could you please give me your e-mail?

Thank you so much!

eric612 commented 5 years ago

I send it to your gmail , please check it

lupotto commented 5 years ago

I didnt recieve it yet. Could you send it to lupotto46@gmail.com?

eric612 commented 5 years ago

Ok , I resend it again

NEU-Gou commented 5 years ago

@eric612 I'm very interested in the BN absorbing. Could you please share that code? Thanks!

eric612 commented 5 years ago

@NEU-Gou Sorry that I didn't say it clearly , the speed was not 2x , here was my test

Network input resolution 416 , GTX 1080 , this project test yolov3-lite-bn and bn-merged

  1. bn-merged : first image cost 15 ms , and others cost 7 ms
  2. yolov3-lite-bn : first image cost 29 ms , and others cost 11 ms

Actually was not 2x speed , but at least 1.5x

eric612 commented 5 years ago

@lupotto , As I see , you spend too much time consuming convolution at 1/8 scale , like conv19~21 , If it is necessary , I suggest decrease channel number less than 64 , or use bottleneck architecture

NEU-Gou commented 5 years ago

@NEU-Gou Sorry that I didn't say it clearly , the speed was not 2x , here was my test

Network input resolution 416 , GTX 1080 , this project test yolov3-lite-bn and bn-merged

  1. bn-merged : first image cost 15 ms , and others cost 7 ms
  2. yolov3-lite-bn : first image cost 29 ms , and others cost 11 ms

Actually was not 2x speed , but at least 1.5x

@eric612 Thanks for the clarification. The speed improvement is very impressive. Is it possible to share the BN absorb tool you're using? Thanks!

eric612 commented 5 years ago

@NEU-Gou , I just modify code from mobilenet-ssd , and let it can automatically produce prototxt , as I see , I don't have any contribution about inference speed

lupotto commented 5 years ago

@eric612 ,

Thanks for the suggestion. It is finally solved the speed problem. Thanks!

NEU-Gou commented 5 years ago

@eric612 Thanks for pointing me the right code. I will give it a try.