davisking / dlib

A toolkit for making real world machine learning and data analysis applications in C++
http://dlib.net
Boost Software License 1.0
13.59k stars 3.38k forks source link

How to train shape_predictor to obtain shape_predictor_68_face_landmarks.dat #359

Closed rudymelli closed 7 years ago

rudymelli commented 7 years ago

Hello,

I'm try to understand how to create a custom model of shape predictor for face landmark extraction with dlib. I've tried different training (changing train parameters) with images of 300-W challenge (helen, afw, lfpw, ibug, with a total of 3837 different images). I used these because i found face boxes inside the page of the competitions: http://ibug.doc.ic.ac.uk/resources/300-W/ http://ibug.doc.ic.ac.uk/media/uploads/competitions/bounding_boxes.zip

But with model created the landmarks extraction is always bad and the original file shape_predictor_68_face_landmarks.dat, provided with Dlib, is always much better.

In order to understand why the difference is so big, I ask you how you have obtained shape_predictor_68_face_landmarks.dat Then my questions is:

Thank'you very much. Rudy

davisking commented 7 years ago

I used the training data here: http://dlib.net/files/data/ibug_300W_large_face_landmark_dataset.tar.gz. It's just 300-W, but with mirrored images added in.

I don't remember the training parameters, but they were essentially the default values.

You have to be consistent in your placement of the boxes between training and testing. As long as you are consistent it doesn't matter how you do it.

rudymelli commented 7 years ago

Thanks for your replay! In the dataset you linked, are the faces extracted with dlib face detector? I tried with it but in some images the face cannot be found, in that case are you discarded that image for training?

davisking commented 7 years ago

In that case I just put the box in the right place with some other method.

rudymelli commented 7 years ago

What do you mean as "other method"? Manually? How do you preserve consistency for box extraction?

davisking commented 7 years ago

I made up some program that quasi-manually fixed it with some heuristic and my manual review. I don't think it matters. Just be careful about your labeling.

kaygudo commented 7 years ago

Hello! davisking, this doubt is a bit off-topic considering this issue. I am using dlib python modules. I am particularly interested in separately detecting right and left eyebrows and eyes, nose, mouth or lips and jaw lines(referencing the image attached). My question is, is there a way to do so within the frame work of dlib (c++ or python) as the python dlib module in general provides 68 facial points. Or do i need to train my own detectors for each cases? Regards.

landmarked_face2

davisking commented 7 years ago

I don't know what to say. That's literally what the land marking model you mentioned does.

scotthong commented 7 years ago

If you are looking for an alternate way to use the Dlib face landmark detector which extracts and aligns the detected face chips, please check out the following git. https://github.com/scotthong/dlib-align-faces

davisking commented 7 years ago

The main example program for face landmarking, http://dlib.net/face_landmark_detection_ex.cpp.html, extracts aligned faces. So you don't need anything else.

scotthong commented 7 years ago

The main purpose of my project is to allow integration of the face landmark detector with none native C++ application such as Java or Python. I've integrated with a Java based face recognition software I am working on.

https://github.com/scotthong/dlib-align-faces

The Dlib python library does not provide the interface to extract the aligned face chips. The align_dlib provided by facenet project is using the OpenCV to align the face chips. However, the result is not as good as good as Dlib.

https://github.com/davidsandberg/facenet/tree/master/src/align

I've also implemented Dlib face detector using Java JNI. However, the application becomes unstable when running multiple instances of the shape detectors. The size of the image is typically 1920x1080 (1080p). I don't have a good way to measure the amount of memory required to process a given sized image and the Java based server program cannot handle the heavy Java memory usage and crashes when under heavy load. The application is very stable when running only one instance though! Thanks for sharing with your great Dlib library. Go Bucks!

davisking commented 7 years ago

Ha, cool. Sounds good.

It definitely shouldn't crash under load like that though. You have a race condition somewhere in the code. Make sure you aren't touching any objects from multiple threads at the same time without protection by a mutex or similar synchronization mechanism.

scotthong commented 7 years ago

Thanks for the quick response! The same Java-JNI Dlib wrapper can handle 12 concurrent threads as a console mode program under heavy for days without any problem. It only has stability issue only when integrated with a Servlet container. I don't have a chance to find out the root cause yet. Regardless, integrating with an external native executable is much easier with the overhead to reload the shape model each time the program is executed. The time to process a 1080p sized image typically take 3 seconds. Is there a way to improve the performance further?

So far, the MTCNN/Tensorflow based implementation yields the best consistent results. It fails to identify only one image using the original(not pre-aligned) LFW dataset.

https://github.com/davidsandberg/facenet/tree/master/src/align https://kpzhang93.github.io/MTCNN_face_detection_alignment/

Do you have a plan to implement a Dlib version of MTCNN? I also tried dnn_mmod_face_detection_ex for face detection and the results (size of bounding box) is not very consistent! Maybe you can use MTCNN to obtain the bounding boxes to train a dnn_mmod_face_detection_ex model in Dlib. Just a thought! Thanks!

IAmUnbounded commented 7 years ago

Hi @davisking what value of padding has to be used while training the face landmark detection dataset?

davisking commented 7 years ago

Read Kazemi's paper. It's all in there.

IAmUnbounded commented 7 years ago

Padding is not mentioned in the paper. To what parameter in the paper does padding corresponds to?

shoaibsattar823 commented 6 years ago

I there a trained model for 194 points facial landmarks for python?

davisking commented 6 years ago

No

thiruandroidhub commented 6 years ago

Hi @davisking I am prtotoyping an Android app to detect facial landmarks. For that am using dlib-android which is the ported version of dlib for Android. Can I use that in commercial apps? As you have mentioned before the dataset that it uses is the shape_predictor_68_face_landmarks.dat which needs an approval by the UCL. I have sent them a request for that and is that all should i need to get approval for or anything else? I am overall happy with the results that I get with dlib-android with this dataset and would like to use further in a commercial app. I wonder if the UCL doesn't give access to use the dataset what are my alternatives to train a model to use with? Please advice. Thanks

davisking commented 6 years ago

You can use dlib any way you want.

However, in the machine learning field right now there is a generally unclear issue of what kinds of rights owners of content have over machine learning models trained on that content. For example, a whole lot of models are trained on data scraped from the internet and most people and companies just ignore this issue. Just ignoring it is standard practice right now. However, UCL wants to claim that any models trained on their dataset, which is images scraped from the internet + annotations they made, are owned by UCL. I am not a lawyer, I have no idea how enforceable that is. If you want legal advice you need to talk to a lawyer.

thiruandroidhub commented 6 years ago

Thanks for your replay. I am waiting for UCL's reply still and will talk to a lawyer as you suggest anyway. Cheers!

sandhyacs commented 5 years ago

Hi Davis,

Can I add some extra features in the existing model shape_predictor_68_face_landmarks.dat like for neck ? if it is, whether i have to follow the same procedure as you did for face?

davisking commented 5 years ago

You can add anything you want, but you have to retrain the model. Follow the documented procedure.

sandhyacs commented 5 years ago

Thank you for your reply. May I know whether I can convert this model to .json or not?

On Sat 30 Mar, 2019, 5:04 PM Davis E. King, notifications@github.com wrote:

You can add anything you want, but you have to retrain the model. Follow the documented procedure.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/davisking/dlib/issues/359#issuecomment-478237439, or mute the thread https://github.com/notifications/unsubscribe-auth/ATTLzXfgK6G0MR3kdCiv7_YtGdJ_LpoKks5vb0uqgaJpZM4LOIS4 .

imaccormick275 commented 5 years ago

Hi All,

I'm trying to train a landmark detector using the Helen dataset (with mirroring) for 194 landmarks. Anyway I was better trying to understand what would be considered a 'good' error. I decided to look at the train/test error of the 'shape_predictor_68_face_landmarks.dat' model on the ibug_300W_large_face_landmark_dataset (see code below.) I'm showing a test/train error of around 6. Intuitively I was expecting something much closer to zero. Do these errors make sense to others?

model_dat = 'shape_predictor_68_face_landmarks.dat'

train_xml = 'ibug_300W_large_face_landmark_dataset/labels_ibug_300W_train.xml' test_xml = 'ibug_300W_large_face_landmark_dataset/labels_ibug_300W_test.xml'

measure_model_error(model_dat, train_xml) measure_model_error(model_dat, test_xml)

Error of the model: shape_predictor_68_face_landmarks.dat is 6.8888672656038965 Error of the model: shape_predictor_68_face_landmarks.dat is 6.204732789912478

hakS07 commented 5 years ago

Hi all, i'm trying to train shape predictor on my custom dataset(selfie images,1300images 513*513)to landmark iris in the face,for the xml annotation: 1)I used 20 parts(x,y) for each image: the points are extracted using deep learning segmentation model 2) I used 1 box for each image(knowing that my data contain one face in each image): the box(top, left, width, height) are detected by using face detection model(haar cascade and Yolo) so as i prepared the XML file for the training(training options, I used the defaults) but the result was so bad Capture du 2019-09-27 18-33-27 Any help??