ageitgey / face_recognition

The world's simplest facial recognition api for Python and the command line
MIT License
52.8k stars 13.42k forks source link

Does this library works good in recognizing faces of black women ? #362

Open MLDSBigGuy opened 6 years ago

MLDSBigGuy commented 6 years ago

In this video https://www.youtube.com/watch?time_continue=6&v=TWWsW1w-BVo, speaker describes about how different libraries misclassified her and not able to detect her face.

If this is because of the training samples dlib gave to train the model, how can we retrain on top of the already trained model with our failed samples ?

All tutorials i saw till now are difficult for my understanding just to retrain the existing model with some new sample.

Could someone please give sample prototype or ideas for this retraining ? I am new to deep learning/tensorflow/keras.

Thank you,

ageitgey commented 6 years ago

The problem she talks about is a real and important problem. Most models are trained on whatever datasets are most easily available (for example, pictures of celebrities who have lots of images online). This means that the models can do worse for people not well represented in the training data. The trained models will reflect whatever biases exist in the training data. For example, this module tends to have lower accuracy with people from Asia.

The way to fix the problem is to improve the training data to have a wider range of people. But that's surprisingly difficult because getting access to millions of labeled pictures of people's faces is not easy if you aren't Google or Facebook.

Face detection accuracy can be improved with relatively few images, but face recognition requires millions of images to train the model. Just adding in a few new samples won't help anything. You need to add hundreds of thousands of new samples.

As such, I don't know the answer to this problem yet. To make this program better, I wish there was an open source set of face images where people could contribute images for the benefit of everyone. But at the same time, there are obvious problems with building a public dataset like that with personal data.

To answer your specific question, it's not really practical for you to try to retain the face recognition model yourself. To do that, you'd need a few million pictures of faces separated by identity and a relatively beefy GPU. And even if someone else already has a dataset of millions of pictures like that for you to start with, those datasets aren't easy to share for various legal reasons (copyright, personal privacy, etc).

MLDSBigGuy commented 6 years ago

Thank you for the elaborate answer 👍

kaisark commented 6 years ago

Part of the problem is bias in the training set (algorithm/software), but part of facial detection and recognition limitations are also technical (lighting/sensor data - hardware). For now, the general rule is who ever has the most data (or defend-able data) wins...

Anyway, I pulled down her Ted photo (https://www.ted.com/speakers/joy_buolamwini) and ran the find_faces_in_picture_cnn.py script and it did locate her face successfully (results below):

https://www.media.mit.edu/projects/gender-shades/overview http://gendershades.org/overview.html "Researcher Joy Buolamwini initiated a systematic investigation after testing her TED speaker photo on facial analysis technology from leading companies. Some companies did not detect her face. Others labeled her face as male. After analyzing results on 1270 uniques faces, the Gender Shades authors uncovered severe gender and skin-type bias in gender classification."

ss1

joybuolamwini_2016x1