About face bounding box

pokidyshev commented 5 years ago

Hi, @D-X-Y

Thanks for sharing your awesome results.

I was playing a bit with your code and found out that you use hardcoded bounding box for SBR stage: gap, x1, y1, x2, y2 = 5, 5, 5, 450, 680

It looks like this:

So, basically, you are using the whole image cropping 5px padding from the sides.

I was going to run SBR on my own videos and got stuck with the question: where do I get face bounding boxes for my unlabeled videos? Can I use cv2/dlib/etc to detect faces? Do I need to widen/adjust their predictions?

pokidyshev commented 5 years ago

If I use YouTube-Faces do I need to adjust face bounding boxes somehow? Or just use the labels as-is? You mentioned earlier that you filter low-resolution videos: which resolution you considered low?

D-X-Y commented 5 years ago

@pokidyshev For the first comment, we use fixed cropping, because that face dominates the video and does not need a face detector to give the bounding box. For your own video, you can use any pre-trained face detector including cv2/dlib. However, it raises a problem, your detector might be different from the detector used on the labeled images, and thus may cause some differences due to your training data is not using the same detector.

D-X-Y commented 5 years ago

For the second comment. You can directly use the provided face bounding boxes from YouTube-Faces. The face with a size lower than 256*256 is considered as low.

pokidyshev commented 5 years ago

@D-X-Y Thanks for the reply! What should be the value of the loss after training SBR 50 epochs using 300W+300VW+YouTube-Face?

D-X-Y commented 5 years ago

Since this work is done almost two years ago, I can not remember the detailed loss value. Sorry about that. A simple way to monitor whether your training works well is to check whether the detection loss of labeled data decreases smoothly (or keeps a similar value) or not.

D-X-Y commented 5 years ago

Since this work is done almost two years ago, I can not remember the detailed loss value. Sorry about that. A simple way to monitor whether your training works well is to check whether the detection loss of labeled data decreases smoothly (or keeps a similar value) or not.

pokidyshev commented 5 years ago

Ok, thanks so much!

pokidyshev commented 5 years ago

How long did it take you to train 50 epochs? And which hardware did you use?

UPD: found in another issue: "I remember it takes several days to train on 4 Titan V GPUs"

D-X-Y commented 5 years ago

Yes, on Nvidia Tesla 32GB V100.

pokidyshev commented 5 years ago

I started training on YouTube faces and stumbled upon a strange thing: NME raises after 1st epoch and then it drops continuously. But as of the time of the 4th epoch, it is still not good as it was before SBR. Is it OK?

Here is a part of my log:

Compute NME and AUC for 689 images with 68 points :: [(NME): mean=3.884, std=2.461], auc@0.07=48.442, auc@0.08-54.057, acc@0.07=91.872, acc@0.08=94.630
Compute NME and AUC for 689 images with 68 points :: [(NME): mean=4.356, std=2.930], auc@0.07=43.590, auc@0.08-49.573, acc@0.07=89.550, acc@0.08=93.324
Compute NME and AUC for 689 images with 68 points :: [(NME): mean=4.110, std=2.746], auc@0.07=46.166, auc@0.08-51.947, acc@0.07=91.001, acc@0.08=93.469
Compute NME and AUC for 689 images with 68 points :: [(NME): mean=4.059, std=2.928], auc@0.07=47.113, auc@0.08-52.759, acc@0.07=90.566, acc@0.08=93.904
Compute NME and AUC for 689 images with 68 points :: [(NME): mean=4.038, std=2.862], auc@0.07=47.243, auc@0.08-52.995, acc@0.07=91.872, acc@0.08=94.485

D-X-Y commented 5 years ago

Yes, "the NME raises at the first several epochs" is normal. As long as it will finally drop to a lower value than the initial NME, it will be fine. The SBR requires much more epochs than 4, in my memory, I trained it for more than 100 epochs.

pokidyshev commented 5 years ago

Ok, thanks so much!

facebookresearch / supervision-by-registration

About face bounding box #47