davisking / dlib

A toolkit for making real world machine learning and data analysis applications in C++
http://dlib.net
Boost Software License 1.0
13.51k stars 3.37k forks source link

Pedestrian detector #186

Closed sturkmen72 closed 8 years ago

sturkmen72 commented 8 years ago

i have trained a pedestrian detector with INRIA dataset can be found at https://github.com/sturkmen72/dlib_pedestrian_detection

i will share some more information about how to train it soon on the github repo.

many thanks for @davisking for the great library.

davisking commented 8 years ago

Sweet :) Feel free to add this to the dlib users wiki.

tej06 commented 7 years ago

I've been trying to train a pedestrian detector using INRIA dataset as well. The training procedure seems to go fine. But I'm not able to detect anything even on the trained data. Does annotation have to do anything with it? Also how does merging the pictures help?

sturkmen72 commented 7 years ago

can i see your training.xml

tej06 commented 7 years ago

https://github.com/tej06/dlib-pedestrian-detection/blob/master/training.xml

I'm basically annotating each image separately with a fixed aspect ratio.

sturkmen72 commented 7 years ago

using INRIA's annotation information to create such training.xml is possible. ( i already done before i will share when i find it ) how much time did you spend to do it?

tej06 commented 7 years ago

I wrote a python snippet to extract the information from the INRIA's annotation information. It took me somewhere around 15 to 20 mins. That had an issue where some annotations were not fine. So i annotated them again using imglab. My problem is that after training my percentage of detection is 0.

sturkmen72 commented 7 years ago

i found the code i used before to generate annotation file and generated four files. you can find them at https://github.com/sturkmen72/dlib_pedestrian_detection/tree/master/INRIA

i was planning to try creating a detector but did not find some free time yet

NOT: why there are four files.

it was only one file ( inria_test2.xml ) when i create it first. i create it sampling dlib/examples/faces/training.xml. it ignores rectangles when they are out of image boundaries.

i created other files hoping to be useful for you.

inria_test2.xml contains all rectangles even they are out of image boundaries. inria_test3.xml having labels, ignores rectangles when they are out of image boundaries. inria_test4.xml having labels, contains all rectangles even they are out of image boundaries.

tej06 commented 7 years ago

Thanks a lot:) I'll train the detector with this.

How does overlapping annotations affect the training procedure?

sturkmen72 commented 7 years ago

as i said i did not try yet. i am not too experienced in this field.

sturkmen72 commented 7 years ago

what is your purpose to train a pedestrian detector?

tej06 commented 7 years ago

It's a part of a project I'm working on.

sturkmen72 commented 7 years ago

did you try OpenCV's HOGDecsriptor ?

tej06 commented 7 years ago

Yes I did. It works well. But it's a bit fickle. Even with NMS it behaves erratically for video streams. I'm trying to achieve a better detection using DLib. If this doesn't work out I'll resort to OpenCV's HOGDescriptor.

davisking commented 7 years ago

As I recall, INRIA doesn't label all the people in each image. That's why it's not working. There is also an issue of excessively tight cropping which is far from ideal. You want to train on images that look like the real testing images, not ones that are cropped in some obviously target biased way. But most importantly, the dlib training algorithm uses all non-labeled parts of the image as negatives, so if there are any people in the background it's not going to work very well and will likely learn that the lowest error behavior is to never detect anything. This feature is why you don't need to deal with hard negative mining. It does that for you, but it's obviously going to do something bad if the background is full of unlabeled people.

You should get a more modern dataset, like the Caltech Pedestrian Detection Benchmark and take care that you train on images that are fully and consistently labeled.

davisking commented 7 years ago

Also, this is such a common question that I just added a new FAQ for it: http://dlib.net/faq.html#WhydoesnttheobjectdetectorItrainedwork :)

tej06 commented 7 years ago

Thanks a lot @davisking This sure did help a lot:)

davisking commented 7 years ago

No problem. I should have written this FAQ answer a long time ago :/

madduci commented 7 years ago

Hi @davisking , I've found this closed issue and I want to share the work I've done so far with some dataset publicly available on internet: to train the pedestrian classifier, I'm usingthis one which contains some scenarios with partial occluded elements as well. All the data from USC is shipped with ground truth data, which makes it easier to be used with dlib, therefore I've written a parser for those XML data and converted in the format supported by dlib (the same used for face_detection). If anyone is still interested, you can find the parser (written with Python 3) here: Conversion of Ground Truth data for the USC dataset.

By the way, thank you for all the efforts you put in this work!