choyingw / SynergyNet

3DV 2021: Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry
MIT License
368 stars 56 forks source link

2 version of annotation of benchmark are inaccurate.. if I did any wrong? #30

Open ken881015 opened 1 year ago

ken881015 commented 1 year ago
choyingw commented 1 year ago

I’m not sure what inaccurateness you are referring to.

For the groundtruth, the original AFLW2000-3D annotation exactly contains much error. Its annotation process (described in the associated CVPR 2016 paper) is automatic but not manual with several failure cases as you shown here. Thus, there is another reannotated version of AFLW2000-3D for remedy but it’s more recent and the convention is comparison on the original one. I think it makes more sense to evaluate on the reannotated version. Or recently there are some other better-quality datasets for facial alignments.

For the model, I agree that there are still some room of improvement. The model tends to produce large error on highly occluded or very large pose faces, but it’s what we always struggle with.

On Tuesday, June 27, 2023, ken881015 @.***> wrote:

-

Hello, I'm really appreciate the work you completed, SynergyNet is not only light-weighted but also keep an acceptable accuracy on AFLW2000-3D.

Although I'm also one of trainer who can't reproduce NME 3.4% (best is 3.674% after fix code problem in here https://urldefense.com/v3/__https://github.com/choyingw/SynergyNet/issues/18*issuecomment-1600030352__;Iw!!LIr3w8kk_Xxm!pQ5KerkncxONoAPU1G2QmGWgO3mKJwAwO7WOB8Cfuh8ULlea9I9anVPzlt_NWafh8M-lEXbCIaILwF97KqR6cxjX$) on original annotation of benchmark, I keep trying to analysis what kind of images model failed on it and try improve it. So, I sorted NME of 2000 images, make a grid of 48 worst images and the model alignment on it, and show the ground truth of annotation beside it. (for each pair, left is model output and right is ground truth. and its reannotated version). [image: grid_of_worst_alignment_0~47_re_v2_fix_loss_problem_80] https://urldefense.com/v3/__https://user-images.githubusercontent.com/38501223/249102379-3799a5e7-5439-4466-9bf8-73dfd3703a3e.png__;!!LIr3w8kk_Xxm!pQ5KerkncxONoAPU1G2QmGWgO3mKJwAwO7WOB8Cfuh8ULlea9I9anVPzlt_NWafh8M-lEXbCIaILwF97Kpwg4-gH$

As you can see, some annotation of Ground Truth is not accurate, (index start from 1) pair of (1,1) (1,2) is obvious that its annotation is not worth for reference..., but by other pair (e.g. (8,6)), it shows that reason of large NME is due to model performance instead of inaccurate annotation, namely, it's still have chance to be improved.

Here is part of my code to post process the file (roi_box, pts68...) you offer in the repo, and visualize the alignment on image. For the inaccurate problem, did I do anything wrong? or is there any opinion you can share for us? I'll be really appreciated for it.

put this code in ./aflw2000_data/ and you can run it

import matplotlib.pyplot as plt import numpy as np from pathlib import Path

you can select by image name

img_name = "image02156.jpg"

img = plt.imread("./AFLW2000-3D_crop/"+img_name)

choose the version of benchmark annotation (ori or re)

pts68 = np.load("./eval/AFLW2000-3D.pts68.npy") pts68 = np.load("./eval/AFLW2000-3D-Reannotated.pts68.npy")

bbox = np.load("./eval/AFLW2000-3D_crop.roi_box.npy") fname_list = Path("./AFLW2000-3D_crop.list").read_text().strip().split('\n')

coordinate process

pts68[:,0,:] = (pts68[:,0,:] - bbox[:,[0]]) / (bbox[:,[2]] - bbox[:,[0]]) 120 pts68[:,1,:] = (pts68[:,1,:] - bbox[:,[1]]) / (bbox[:,[3]] - bbox[:,[1]]) 120

fig, ax = plt.subplots()

plot image

ax.imshow(img)

scatter landmarks

idx = fname_list.index(img_name) ax.scatter(pts68[idx,0,:], pts68[idx,1,:])

fig.savefig("alignment.jpg")

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/choyingw/SynergyNet/issues/30__;!!LIr3w8kk_Xxm!pQ5KerkncxONoAPU1G2QmGWgO3mKJwAwO7WOB8Cfuh8ULlea9I9anVPzlt_NWafh8M-lEXbCIaILwF97KiF-N2mM$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKMLAJI2LHS7D5GFEDLUVQLXNKT7HANCNFSM6AAAAAAZVK3B44__;!!LIr3w8kk_Xxm!pQ5KerkncxONoAPU1G2QmGWgO3mKJwAwO7WOB8Cfuh8ULlea9I9anVPzlt_NWafh8M-lEXbCIaILwF97KtPnkr5j$ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Ph.D. candidate of Computer Science Department University of Southern California

ken881015 commented 1 year ago
choyingw commented 1 year ago

Thanks for your clear visualization. As far as I know, annotation of 3D landmarks is very challenging, some recent datasets such as NoW benchmark or DAD-3DHeads (https://www.pinatafarm.com/research/dad-3dheads) may be a better choice. Otherwise, you can manually filter out bad annotations in AFLW2000-Reannotation, which is reasonable in my opinion.

Random erasing in my opinion may help in some limited cases such as cropped out faces, but some other occlusion types such as hands or scarfs are hard since the occlusion shape is irregular. An easy hack is to add those erased images in the training set and see if this trick helps in cropped out face cases.

About the NME, I'm not sure what causes this phenomenon (maybe learning rate change points), but I think the best NME happens after milestones is reasonable, since the milestone suggests when to adjust learning rate. The lower LR indicates better convergence. Something we have in our mind (but not fully tested yet) is maybe a lower final LR and longer training epochs may help attain better minima.

Ph.D. candidate of Computer Science Department University of Southern California

On Tue, Jun 27, 2023 at 11:40 PM ken881015 @.***> wrote:

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/choyingw/SynergyNet/issues/30*issuecomment-1610852535__;Iw!!LIr3w8kk_Xxm!tzdzerviy2QCrtXznJylGRjRSiHv2bckDBRkEF80pzUHoIOGQ-d2P7zqZSoFG_2qCl4UtqZ1GIpfeuhtUZJ0bXUQ$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKMLAJMZYO3F2DF3LSCYFRLXNPGWLANCNFSM6AAAAAAZVK3B44__;!!LIr3w8kk_Xxm!tzdzerviy2QCrtXznJylGRjRSiHv2bckDBRkEF80pzUHoIOGQ-d2P7zqZSoFG_2qCl4UtqZ1GIpfeuhtUev-Fk-7$ . You are receiving this because you commented.Message ID: @.***>

choyingw commented 1 year ago

For the AFLW2000-3D, if there is some remedy for out-of-distribution cases (occlusion, underwater, or much large pose), the NME can still be improved as those cases can break down the whole performance greatly. Much of the previous methods for facial landmark generally focus on learning a good representation for face, but didn't specifically incorporate prior for OOD data. Happy to discuss more via email.

Ph.D. candidate of Computer Science Department University of Southern California

On Tue, Jun 27, 2023 at 11:58 PM Cho-Ying Wu @.***> wrote:

Thanks for your clear visualization. As far as I know, annotation of 3D landmarks is very challenging, some recent datasets such as NoW benchmark or DAD-3DHeads (https://www.pinatafarm.com/research/dad-3dheads) may be a better choice. Otherwise, you can manually filter out bad annotations in AFLW2000-Reannotation, which is reasonable in my opinion.

Random erasing in my opinion may help in some limited cases such as cropped out faces, but some other occlusion types such as hands or scarfs are hard since the occlusion shape is irregular. An easy hack is to add those erased images in the training set and see if this trick helps in cropped out face cases.

About the NME, I'm not sure what causes this phenomenon (maybe learning rate change points), but I think the best NME happens after milestones is reasonable, since the milestone suggests when to adjust learning rate. The lower LR indicates better convergence. Something we have in our mind (but not fully tested yet) is maybe a lower final LR and longer training epochs may help attain better minima.

Ph.D. candidate of Computer Science Department University of Southern California

On Tue, Jun 27, 2023 at 11:40 PM ken881015 @.***> wrote:

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/choyingw/SynergyNet/issues/30*issuecomment-1610852535__;Iw!!LIr3w8kk_Xxm!tzdzerviy2QCrtXznJylGRjRSiHv2bckDBRkEF80pzUHoIOGQ-d2P7zqZSoFG_2qCl4UtqZ1GIpfeuhtUZJ0bXUQ$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKMLAJMZYO3F2DF3LSCYFRLXNPGWLANCNFSM6AAAAAAZVK3B44__;!!LIr3w8kk_Xxm!tzdzerviy2QCrtXznJylGRjRSiHv2bckDBRkEF80pzUHoIOGQ-d2P7zqZSoFG_2qCl4UtqZ1GIpfeuhtUev-Fk-7$ . You are receiving this because you commented.Message ID: @.***>