Failed to train the APNet

lubovbyc commented 2 years ago

Following the default procedure and parameter settings, I cannot train the APNet successfully. For different latents input, the network always produces the same result, e.g. mean face. 00001253

I tried to print the output of each layer (as shown below). It seems the network has already collapsed.

cassiePython commented 2 years ago

@lubovbyc It seems the optimization collapsed. I'm not sure. Have you tried other losses, such as the PDC loss, VDC loss ? You may combine all losses or try another optimizer. If still bugging you, please share your generated data with me, I will try it on my server.

lubovbyc commented 2 years ago

@cassiePython Thanks for your reply!

Have you tried other losses, such as the PDC loss, VDC loss ? You may combine all losses or try another optimizer.

Yes. I have tried different losses a few times with different weights. Besides, I also attempted to decrease the initial learning rate. But it seems that all these attempts could only slow up the process of collapsing.

I'm not sure whether I missed some important details. In normal cases, is the training of APNet stable? I have uploaded the generated data to google drive. Please help check it when you are available. Thanks a lot!

cassiePython commented 2 years ago

@lubovbyc Thanks for your sharing. I will check it immediately after the Sig Asia submission. Thanks for your patience.

cassiePython commented 2 years ago

@lubovbyc Hi. I have tried to alleviate this. You can add more MLP layers for the APNet and use the PDC loss to improve robustness.

lubovbyc commented 2 years ago

@cassiePython Thanks for your reply! I will take a shot and check if working.

roundchuan commented 2 years ago

@lubovbyc Hi. I have tried to alleviate this. You can add more MLP layers for the APNet and use the PDC loss to improve robustness. Can you provide the details of change to avoid the collapse. I also get the mean face according to the default code.

roundchuan commented 2 years ago

@cassiePython Thanks for your reply! I will take a shot and check if working.

Do you get the correct results of APNet by the advice of author? I meet the trouble same as you

hxngiee commented 2 years ago

I also struggle in finding the good results of APNet. Should I use the renderer, landmark loss as well?

cassiePython commented 2 years ago

@lubovbyc I am still confused with the problem. I did not find this problem on my dataset. I attached the dataset with 4K images and the corresponding checkpoint. Please check whether it works on your device: https://portland-my.sharepoint.com/:u:/g/personal/cwang355-c_my_cityu_edu_hk/EWMWjP8DHtpEqvhLtZTyfr0BerKDVixlbx8zApUS3QTngA?e=GXsNKw.

Besides, first, please try adding the PDC loss.

cassiePython commented 2 years ago

I also struggle in finding the good results of APNet. Should I use the renderer, landmark loss as well?

Before using the pseudo gt to train the APNet, I have tried the combination of the rendered loss and the landmark loss. Unfortunately, I fail to train the APNet.
But I have not tried the WPDC loss plus the rendered loss and the landmark loss.

roundchuan commented 2 years ago

@lubovbyc I am still confused with the problem. I did not find this problem on my dataset. I attached the dataset with 4K images and the corresponding checkpoint. Please check whether it works on your device: https://portland-my.sharepoint.com/:u:/g/personal/cwang355-c_my_cityu_edu_hk/EWMWjP8DHtpEqvhLtZTyfr0BerKDVixlbx8zApUS3QTngA?e=GXsNKw.

Besides, first, please try adding the PDC loss.

Using the model and data provided by you, I still get the same face for different latent code. However, there is a change that the render face is not the mean face using your model.

cassiePython commented 2 years ago

@roundchuan Can you get results like this:

roundchuan commented 2 years ago

@roundchuan Can you get results like this:

@roundchuan Can you get results like this:

No , all the render faces are the same. And I print the "param_lst" the output of APNet are all the same.

roundchuan commented 2 years ago

I also struggle in finding the good results of APNet. Should I use the renderer, landmark loss as well?

Before using the pseudo gt to train the APNet, I have tried the combination of the rendered loss and the landmark loss. Unfortunately, I fail to train the APNet.

But I have not tried the WPDC loss plus the rendered loss and the landmark loss.

The results are just like this follow your data and checkpoints.

hxngiee commented 2 years ago

@lubovbyc I am still confused with the problem. I did not find this problem on my dataset. I attached the dataset with 4K images and the corresponding checkpoint. Please check whether it works on your device: https://portland-my.sharepoint.com/:u:/g/personal/cwang355-c_my_cityu_edu_hk/EWMWjP8DHtpEqvhLtZTyfr0BerKDVixlbx8zApUS3QTngA?e=GXsNKw. Besides, first, please try adding the PDC loss.

Using the model and data provided by you, I still get the same face for different latent code. However, there is a change that the render face is not the mean face using your model.

@cassiePython Thanks for your kind reply. can you attach constants.pkl as well? It seems that the file is omitted

cassiePython commented 2 years ago

TO ALL: Recently, I generate more datasets with different sizes (e.g. 4K, 6K, 8K, 1W, 1.5W, 2W), and find a local minimum (e.g. the mean face) will appear during training the APNet all the time on some datasets. Before fixing this issue, you can use my pre-trained model first: https://drive.google.com/drive/folders/1qNvRu8vLPD278FW7GS-I9p6-yxYhKZY9

I am trying to:

Add more data to make the latent space compact.
Add the rendering loss and landmark loss, following StyleRig and traditional face reconstruction methods.

cassiePython / cddfm3d

Failed to train the APNet #3