davidsandberg / facenet

Face recognition using Tensorflow
MIT License
13.83k stars 4.81k forks source link

Embeddings classification ability #134

Closed lodemo closed 7 years ago

lodemo commented 7 years ago

Hello, i wanted to create a comparison between different face recognition techniques in a classification context. For this i reused the classification experiment from Openface (https://github.com/cmusatyalab/openface/blob/master/evaluation/lfw-classification.py) and integrated Facenet into it. The results are however not as i would have thought. Even at 10 different people accuracy is only around 0.5. Are the resulting image embeddings different to the one from Openface in their ability to be used directly in classification, using a classifier like SVM?

At the moment im feeding a single image to the network like this and use the resulting embedding as the representation for classification/evaluation:

imgs = np.reshape(img, (1, 160, 160, 3)) # img is (160, 160, 3) feed_dict = { images_placeholder:imgs } emb = sess.run(embeddings, feed_dict=feed_dict) rep = emb[0]

Or is the error perhaps in the evaluation of the results? Currently using the same accuracy calculation like Openface, the accuracy_score function from sklearn.metrics.

regards

davidsandberg commented 7 years ago

The loss used for training is different in openface compared to this implementation, but I don't think that should cause any major differences in how the resulting embedding can be used. Could it be something related to scaling/normalization of the input image?

lodemo commented 7 years ago

Thank you for the suggestion, i use the Mtcnn aligned LFW dataset with size 160, like described in Validate on LFW. In the Openface script, cv2 is used reading the data with the following,

img = cv2.imread(imgPath) img = cv2.resize(img, (size, size)) ... img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

which seems to load the image with cv2.imread as BGR and converts it to RGB. RGB is correct for facenet? Size should not be changed as i specified 160. I will further look into it anyway.

lodemo commented 7 years ago

I did further look into it and replaced the data reading with the routine used in Facenet. I also missed prewhitening completely which probably was the cause of it, or scipy misc.imread() does return different data than cv2.imread()...

After this the results did improve significantly, even more than i expected, compared to the Openface results.

accuracies

What would you say is the cause for such a difference to Openface, the different loss function? as the embedding dimensions are the same.

davidsandberg commented 7 years ago

Hi @lodemo! That's a very nice result! It would be nice to see how many distractors are needed before the accuracy for "Facenet LinearLVC" starts to degrade. For the differences I guess there are a few. I have never managed to get training using triplet loss to work very well. Training the model as a classifier seems to be much easier but it becomes impossible if the number of classes is too large. Also, I never managed to get the NN4 model to converge, so a solution to that was to use the Inception-Resnet models instead. And finally the alignment of training images is different and the MTCNN is very robust to partial occlusion, siluettes etc which makes the training data more diverse. Which model did you use in this experiment? I have just uploaded a model trained on Ms-Celeb-1M that performs 4 times better than the 20161116-234200 model in terms of LFW errors.

lodemo commented 7 years ago

Hi, i used the 20161116-234200 model from the link in Validate on LFW, but i will try it again with the new model!

What do you mean exactly under distractors? Higher number of people? plan to do that in the next run.

lodemo commented 7 years ago

Hi, i tried running it with the 20170117-215115 model but encountered an error related to the model-20170117-215115.ckpt-285000.data-00000-of-00001 file.

Error: tensorflow/core/framework/op_kernel.cc:975] Data loss: Unable to open table file 20170117-215115/model-20170117-215115.ckpt-285000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

Is the file corrupt or does it need to be loaded differently as the 20161116-234200 model?

davidsandberg commented 7 years ago

Hi, The new restore adds the last part to the file name so the only first part that should be given, i.e. 20170117-215115/model-20170117-215115.ckpt-285000 in this case.

lodemo commented 7 years ago

Thank you, it seems to work, i will run it and post results later.

regards

lodemo commented 7 years ago

I ran the scripts with the new model 20170117-215115, it seems to improved the result further.

More prominently in the higher number of people, which i increased up to 1600 (the number of people in LFW with 2 or more images)

Model 20161116-234200: 400 people: 0.992176 accuracy 1600 people: 0.864177 accuracy

Model 20170117-215115: 400 people: 0.997066 accuracy 1600 people: 0.926279 accuracy

accuracies

accuracies

What is interesting, comparing the training time needed for the above results compared with Openface. Both training a LinearSVC with one vs. rest scheme:

traintimes

As the dimension are both the same, the SVM seems to take longer fitting the Facenet features.

hardfish82 commented 7 years ago

@lodemo It's amazing work! I follow https://github.com/cmusatyalab/openface/blob/master/evaluation/lfw-classification.py and add the facenet model(20170117-215115), my result is not so good as you got. What did you improve? The C parameter of LinearSVC?

accuracies

lodemo commented 7 years ago

Actually not, i did leave C=1 but i used the One vs. All scheme explicitly LinearSVC(C=1, multi_class='ovr')

Did you apply all the preprocessing for the Facenet images? RGB convert, Crop and Prewhitening of the Mtcnn aligned images is important.

I am currently also running evaluation of Openface and Facenet on the YouTube faces database. Will post results when finished.

hardfish82 commented 7 years ago

I used SVC API like this: cls = SVC(kernel='linear', C=1) And just now, I test your scheme that: cls = LinearSVC(C=1, multi_class='ovr') The result improves slightly: accuracies I did data preprocessing according facenet code, in the function getData() I added a new mode 'facenet' and used the facenet api to load data:

def getData(lfwPpl, nPpl, nImgs, mode, resize=96):
    X, y = [], []

    personNum = 0
    for (person, nTotalImgs) in lfwPpl[:nPpl]:
        imgs = sorted(os.listdir(person))
        for imgPath in imgs[:nImgs]:
            imgPath = os.path.join(person, imgPath)
            img = cv2.imread(imgPath)
            img = cv2.resize(img, (resize, resize))
            if mode == 'grayscale':
                img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
            elif mode == 'rgb':
                img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            elif mode == 'facenet':
                img = facenet.load_data([imgPath], False, False, resize)[0]
            else:
                assert 0

            X.append(img)
            y.append(personNum)

        personNum += 1

    X = np.array(X)
    y = np.array(y)
    return (X, y)
lodemo commented 7 years ago

I see, its probably because you resize the images for Facenet to 96, which if you use the pre-trained model should be 160. Or do you call getData with resize=160 for Facenet?

Did you align the images to 160 with Mtcnn?

hardfish82 commented 7 years ago

Yes, The lfw images are aligned to size 160*160 and I got facenet data with resize=160. Otherwise a shape mismatch error would be raised.

    net = facenet_model.FaceNet("/home/x0269/models/20170131-234652")
    facenetGPUsvmDf = cacheToFile(cache)(openfaceExp)(lfwPpl, net, cls, 'facenet', 160)

Did you filter lfw images with https://github.com/cmusatyalab/openface/blob/master/util/prune-dataset.py? I did it with --numImagesThreshold=5, then the persons with less than 5 images are filtered. The total number of the left persons is 423.

lodemo commented 7 years ago

Ah no i did not prune images.

In getLfwPplSorted the people are returned sorted for number of images, so selecting the first 100 people should result in the same as with the pruned dataset i think?

lodemo commented 7 years ago

You can try running my version of the script, see if anything changes. https://gist.github.com/lodemo/f49ac4a7402d2de3163cf5adfad79d43

I split most methods for Openface and Facenet, a little redundant but works for now.

hardfish82 commented 7 years ago

Thanks @lodemo . The difference is on the align method. According to https://cmusatyalab.github.io/openface/demo-3-classifier/ , I use dlib align. The former result is based on the dlib_aligned faces. I tried the mtcnn align method https://github.com/davidsandberg/facenet/wiki/Validate-on-LFW just now and got the same result as yours. It's amazing! Thanks one more time. accuracies