albanie / mcnSSD

A matconvnet implementation of the Single Shot Detector
MIT License
36 stars 23 forks source link

is 'ssd_demo.m' not multibox-detection? #10

Closed 89douner closed 7 years ago

89douner commented 7 years ago

When I executed ssd_demo.m, I only got single detection result. As far as I know, SSD is multibox-detetor (multi-objects detection). Do you have plan to change or add the code (for multi-detection) to mcnSSD ? Also, Do you have plan to add any code (to detect video format(real-time detection)) to mcnSSD .

I will also try to make the code (for multi-detection and detecting video format) through mcnSSD.

albanie commented 7 years ago

The demo code is just an example which sorts the predictions by confidence and simply returns the top one (you can see this in the code here. In the standard evaluation code (e.g. for pascal evaluation here, it can return up to 200 predictions per image). It terms of realtime detection, a couple of people have mentioned to me that they used it on video. On a pascal NVIDIA gpu, with a 300x300 input image it runs at up to around 70 fps (or around 55-60 fps on older architectures).

89douner commented 7 years ago

ssd_demo_multi.zip I fixed some codes from ssd_demo.m and I succeed to detect and classify multi-object. The fixed code is ‘ssd_demo_multi.m (attaching m-file)


However, I still do not know about realtime (tracking) part. Could you tell me about tracking object using mcnSSD in a little detail? For example, reference site, any useful technique like converting video into frame..

You do not have to tell me too much. I just want to get a hint or an idea! Thanks..!!

p.s Although I want to send you this reply through email, I didn't.. because I don't know your email-address.

albanie commented 7 years ago

Hi @89douner, thanks for the updated file. To make it clearer for users, I will update the demo to show the detection of multiple objects. With regards to tracking, SSD could be used to do independent frame-by-frame tracking (i.e. simply run the detector on each image one at a time). However, it does not have a built-in mechanism for performing tracking across frames - for this you would need to add something extra to improve performance. You can extra frames directly from a video with the matlab function here.

Once simple approach could be to run the detector on all the frames, and then run a basic tracker such as KLT to join up the bounding boxes. Unfortunately, I'm not too knowledgeable on tracking so I'm not sure what the current best techniques are for this task :)

89douner commented 7 years ago

Thanks for your reply. Your updated code, ssd_demo.m including multi-detection, became more useful. Also, The updated ssd_demo.m seems like faster than previous (single detection) ssd_demo.m. To be specific, it takes previous (single detection) ssd_demo about 30 sec to float figure in MATLAB. However, it takes your updated code (ssd_demo.m, multi-detection) about just 2~4 sec. I don't know exactly the reason why the updated code is faster than previous code, but it isn't important to me. (but, if you know the reason, I want you to let me know the reason.

This is the main question... Your reply said "SSD processes 70 fps (frame per second). According to the theory, Should it takes ssd_demo.m less than 1 sec to detect and classify (like floating figure in MATLAB) test image in MATLAB? Although this updated code (ssd_demo.m) is upgraded, it takes ssd_demo.m about 3 sec to classify test image. So, I think that it can be a problem because the task-time (3sec) in 'ssd_demo.m' isn't equal to SSD processing time (70 fps--> about 0.01 sec (the task-time in theory). If you know this, please let me know. Thanks!

albanie commented 7 years ago

There are a few things going on here. Firstly, in the demo, there are several operations taking place (loading the model, loading the image, resizing the image, running the network, and plotting the figure). Each of these takes some time. For example, if I run on I my machine in CPU mode, I get a breakdown like this:

model loading time: 0.85 seconds
imread time: 0.12 seconds
imresize time: 0.01 seconds
net eval time: 0.71 seconds
figure generation time: 0.02 seconds

We can re-run in GPU mode, but it's worth being aware that timing things on the GPU is a little tricky. For example, if you simply run a image through the network you will get something like this:

model loading time: 0.91 seconds
imread time: 0.12 seconds
imresize time: 0.01 seconds
move net to GPU time: 3.05 seconds
net eval time: 0.888 seconds
figure generation time: 0.04 seconds

The GPU looks slower than the CPU! What is going on here? The issue is that the first time you run code on the GPU, the execution is pretty slow (there are various factors as to why this is the case). If we re-run the evaluation a few times with a for loop, we get:

model loading time: 0.92 seconds
imread time: 0.12 seconds
imresize time: 0.01 seconds
move net to GPU time: 2.79 seconds
net eval time: 1.072 seconds
net eval time: 0.032 seconds
net eval time: 0.028 seconds
net eval time: 0.026 seconds
net eval time: 0.025 seconds
figure generation time: 0.04 seconds

The first execution is slow, but the following ones are gradually quicker (now up to around 40Hz). To get comparable benchmarks to the caffe code, I followed their approach and ran with a batch size of eight images (GPUs become much more efficient when processing batches of data in parallel):

model loading time: 0.95 seconds
imread time: 0.14 seconds
imresize time: 0.04 seconds
move net to GPU time: 2.85 seconds
net eval time: 1.326 seconds
net eval time: 0.118 seconds
net eval time: 0.118 seconds
net eval time: 0.114 seconds

Once it’s warmed up, it is processing 8 images in 0.114 seconds, (i.e. 70.18 Hz). This is only on a Tesla M40 (it’s quicker on a Pascal). There are other things involved to run a proper timing benchmark though (which should include image loading and pre-processing, and not re-computing on the same image which can over-exploit the cache), but hopefully this gives you a rough idea of the relative timings :)