D-X-Y / landmark-detection

Four landmark detection algorithms, implemented in PyTorch.
https://xuanyidong.com/assets/projects/TPAMI-2020-SRT.html
MIT License
925 stars 180 forks source link

lk_target_loss return None, scale of batch_locs? How to set forward_max and fb_thresh? #73

Closed zhaoruiqiff closed 4 years ago

zhaoruiqiff commented 4 years ago

Thank you for great work!

I'm trying to fine tune my pretrained model using SBR. I have a question about the data scale.

For images, should we normalize the pixel values to 0-1 and further normalize using the following parameters for each channel? mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]

For predicted points locations (batch_locs, batch_next, batch_fbak, batch_back) in order to compute lk_target_loss, are batch_locs pixel locations in the image coordinate? For example if my image size is 256*256, batch_locs will be from 0-255?

Since I'm training on another video dataset, how should I pick forward_max and fb_thresh? In your example fb_thresh = 1, forward_max = 2, they seem too small compared to my image scale of 256*256, so sequence_checks all become False after the loops and all the losses are masked out, so I can't get any loss, but None. Can you guide me here what I did wrong? What should the scale of predicted points locations batch_locs in order to compute lk_target_loss? How should I choose forward_max and fb_thresh?

Thanks!

D-X-Y commented 4 years ago

Thanks for your question!

For the normalization, YES. You can refer to more details here: https://github.com/D-X-Y/landmark-detection/blob/e382e2ba646e2f9667606d1eb0c91fcc4ea14c87/SRT/lib/procedure/starts.py#L35

For batch_locs, it depends. I remember at some places I use the normalized location (0-1) and at someplace I use the raw locations (0-255 for the resolution of 256). You can simply print its min and max values to check whether it is normalized or not.

Yes, forward_max and fb_thresh for hyperparameters for our paper. If it is too small in your case, you can increase them. First of all, I would suggest you use our new SRT (not SBR) codes, which has an accelerated version of LK module by pre-computing the flow. Second, you need to first pre-train your model on the labeled dataset and use the pre-trained model as initialization for SBR or SRT. Third, here is the logic for masking (https://github.com/D-X-Y/landmark-detection/blob/master/SRT/lib/procedure/temporal_loss_regression.py#L44), you can stop here, have a look which checks becomes all False, and loose the corresponding constraint.

zhaoruiqiff commented 4 years ago

Thanks for answering my questions! I have been able to make the code work.

I tried the SRT code, and used the SBR loss as defined in temporal_loss_regression.py. I used my pretrained model as initialization, and fine tuned on my video dataset for a few epochs, using both the SBR loss and my regression loss. I used the same setting as in REG-300W-SBR-300VW-P68.sh

Currently based on my metric, the landmark detection stability on video improved by about 16%. Before fine tuning using your SBR loss, I think visually my results already looked satisfying. So this is quite an improvement. Thanks for your great work! The demo video on the github page looks very impressive. I want to know how to make my detection results on video look as impressive as that one? How should I train my model on videos? Any important parameters to adjust?

Thank you!

D-X-Y commented 4 years ago

Great to hear you get 16% improvements :) W.r.t. the demo video, emmm, not all videos are as good as that one. I picked some impressive one from all the 300VW videos.