Closed gustavomr closed 6 years ago
Does that dataset include identity information? It looks like it's just age and gender.
@davisking what you mean about identity information? I don't understand.
Thanks.
To train a face recognition model you need lots of images of the same person. Like you need "here are 100 images of Davis", then "here are 100 images of John". Not "here are a bunch of images of different people and none of them are the same person".
I got it! I think this dataset has a bunch of images of different people. Do you have any plans to get dlib more accurate on asian people?
Only if someone is able to provide me with appropriate training data.
How many data do you need (per person and how many person) ? ex: 1000 person and 100 images per person?
It's hard to say, but for sure at least something like 1000 persons and 100 images per person. That would be a minimum. The more the better.
I am willing to provide the data because i have around 10k+ unique identities but 10 faces each. Can we just do some kind of image transformation to generate n number of required images using this 10 images. is it sufficient or you need more precise dataset. Thanks.
I have already trained model for asian faces with 98.18%
accuracy. this is the output after training
saving network
Testing network on imagenet validation dataset...
val top5 accuracy: 1
val top1 accuracy: 0.981811
but not able to use this model. If it succeeds then i am willing the train the dataset with more and more dataset and make it available here. plz see this issue #1368 . if you can help me.
Warning: this issue has been inactive for 55 days and will be automatically closed on 2018-09-07 if there is no further activity.
If you are waiting for a response but haven't received one it's likely your question is somehow inappropriate. E.g. you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's documentation, or a Google search.
Notice: this issue has been closed because it has been inactive for 58 days. You may reopen this issue if it has been closed in error.
@davisking Will this help? https://github.com/deepinsight/insightface/issues/256
@davisking can you use this dataset do train? deepinsight/insightface#256
That sounds useful, although the website appears to be down.
@davisking the website is slow but it's working. I tested now.
@davisking Hi, could you consider training and evaluate dlib with this dataset? Anyone has tried to do this?
It looks like the same dataset is available here (as "Glint") amongst others:
Once you register and login to the trillionpairs website you are directed to download the data from:
https://drive.google.com/drive/folders/1ADcZugpo8Z6o5q1p2tIAibwhsL8DcVwH
I downloaded this dataset and it seems pretty nice. Altogether I've got a dataset of 10 million faces now. I'm pretty busy with other things at the moment, but at some point I'll retrain the model and post the results.
@davisking Which step(s) (and/or specific models) in the following process will the retraining help with?
face detection -> encoding -> clustering
I realised that I had been using sklearn's DBSCAN and that once I switched to dlib's Chinese whispers algorithm the results were much better, at least for male Asian faces. Results for female Asian faces were still pretty bad.
We aren't talking about improving the detector. This dataset would make the part that answers questions like "are these two images the same person" more accurate.
I downloaded this dataset and it seems pretty nice. Altogether I've got a dataset of 10 million faces now. I'm pretty busy with other things at the moment, but at some point I'll retrain the model and post the results.
@davisking any plans to have it done?
I'll get to it when I get to it.
Hi Davis, Is there any progress in training model for Asian faces?
I still haven't gotten to it. I have many other responsibilities and making this model is not super high on my priority list.
Could you tell me the step by step to retrain the model with additional Asian faces? Thank you.
can I do transfer learning instead of retrain the model from beginning? Thank you for your help @davisking
@davisking The Diversity in Faces(DiF)is a large and diverse dataset that seeks to advance the study of fairness and accuracy in facial recognition technology. The first of its kind available to the global research community, DiF provides a dataset of annotations of 1 million human facial images. - from IBM
Link to dataset: https://www.research.ibm.com/artificial-intelligence/trusted-ai/diversity-in-faces/
Link to paper: https://www.research.ibm.com/artificial-intelligence/publications/paper/?id=Diversity-in-Faces
I downloaded this dataset and it seems pretty nice. Altogether I've got a dataset of 10 million faces now. I'm pretty busy with other things at the moment, but at some point I'll retrain the model and post the results.
@davisking Thank you so much for putting all of this together! Would it be possible for you to share your entire dataset? Thanks.
I don't want to get into the large dataset hosting game.
Hello everyone, I'm also facing the same problem right now, has anyone here extended @davisking 's model with Asians?
Yes, this is an issue I'm running into also. The dataset produces a lot of false positives when comparing Asian faces.
Same question... can I do transfer learning instead of retrain the model from beginning? Thank you for your help @davisking
I trained a model using the dnn_metric_learning_on_images_ex.cpp code . I only increased iteration without progress to 10000.
I used a dataset with 63K identites that is a combination of VGG2, Asian Celeb and Clean Microsoft Dataset. I also did some manual cleaning. All identities have at least 50 images.
The result was not good. I downloaded the Face Scrub dataset and generated 5K random genuines pairs and 5K random distractors pairs. EER, FMR100 and FMR1000 were worse on this new model.
I don't know what went wrong since I was excepting a good improvement.
@davisking , do you have any advice on this? Should I increase batch size (this would increase training time a lot)?
@tiago-alves How long training was done and which hardware(GPU and CPU specification) did you used?
I trained a model using the dnn_metric_learning_on_images_ex.cpp code . I only increased iteration without progress to 10000.
I used a dataset with 63K identites that is a combination of VGG2, Asian Celeb and Clean Microsoft Dataset. I also did some manual cleaning. All identities have at least 50 images.
The result was not good. I downloaded the Face Scrub dataset and generated 5K random genuines pairs and 5K random distractors pairs. EER, FMR100 and FMR1000 were worse on this new model.
I don't know what went wrong since I was excepting a good improvement.
@davisking , do you have any advice on this? Should I increase batch size (this would increase training time a lot)?
I remember he said 1000 identities and 100 photos each is minimum , any progress?
@basit26374 , I used a VM on Google Cloud with 8 vCPUs and 4 GPUs. I implemented data augmentation with glasses/sunglasses and hairstyles and this adds some extra time for sure. If you are going to train a big dataset, be prepared to wait some days for it to complete. At some point it becomes very difficult to converge.
@JoeQian , using this approach with data augmentation that I mentioned and also after some more data cleaning, I was able to acquire a big improve in accuracy (considering a benchmark dataset I created).
Basically my advice is: keep getting new images and increasing your dataset with different people. I will probably double my dataset size on the next two months. I will let you know once I have new numbers.
Hi,
We saw that dlib has some acurracy problems with asian faces. Could you retrain dlib in order to get better results including this dataset (http://afad-dataset.github.io) ?
Thanks.