ageitgey / face_recognition

The world's simplest facial recognition api for Python and the command line
MIT License
52.84k stars 13.43k forks source link

Accuracy is very less on Large database of Images #237

Open ramineniraviteja opened 6 years ago

ramineniraviteja commented 6 years ago

Hi,

I have a database of 50 million uniques faces i.e one image per person. Initially, I took 36000 images to test the accuracy, surprisingly accuracy is very low even with 0.6 threshold. If i take the compare_faces is true of the known and unknown face, it results in thousands of matching faces. I manually defined the threshold 0.4 still giving disappointing results and not matching same people. Could you please suggest me what can i do to improve accuracy on Large database of Images. I have only one image per person. Suggest me if any other algorithm or way works in my case.

ageitgey commented 6 years ago

If you are willing to share the images, that could be a great dataset for retraining the model to work better on a wider range of faces.

ramineniraviteja commented 6 years ago

Sorry, this is the confidential dataset of the public. I cannot share and it is against privacy violation. Do you have any other thoughts on improving accuracy?

yaoxx255 commented 6 years ago

Hi ramineniraviteja.

this happened to me as well. I think that is because the resolution of the image you use to compare with is low. think about this, we use pic A, whose resolution is low, as sample person. and we aim to find the other pic B of same face in your database. However, the resolution of pics in your huge database is pretty high, comparing to the pic A. the accuracy cannot be high.

the machine would generate an embedding(128 measurements) for each face, which is actually used for comparing. they comes from some features of the face, such as the "proportion of the lengths of eyes and mouth...etc. however if the resolution is low, one pixel offset could cause the "proportions" changes a lot, though the code will still generate an embedding. However the result is not that confident. then it lack the ability to distinguish different people, because it cannot get the accurate features(with embedding) of each face.

this issue happens a lot when the sample pic and target pic's resolution do not match. it is still a way off. if you do some basic search on data form of image, you can tell the difference between "pixel image" which get blurred after being zoomed in; and "vector image", which is still clear while zooming in and out. pixels are the most superficial way to define a picture, and the manage pixels are the hardest.
maybe in the future, a new data form of picture could be invented, which is specific for store human face, which contains only the proportions and other feature of the face instead of pixels. or, more generally, you can imagine there would be a new data form contains some features can be understood by an "image compiler", which is actually a learning machine.

hope you all good.

AnthonyPluth commented 6 years ago

So to confirm, if I want to perform recognition on low-medium quality images, would it be best to use high quality images for training? Or would it be better to use the same quality images for training and recognition?

AlainPilon commented 6 years ago

In a perfect world, the face in the training image would have the same size (in px) as what is expected to find in the images to process.