Some questions about reproduction

xjtupanda commented 1 year ago

Thanks for sharing the great work! I want to follow your work and I'm trying to reproduce all the experiment results. Could you provide more details about Fig.4 in the paper? I have successfully generated videos using the scripts provided, but I don't know how to export a single frame w/ or w/o background color. Moreover, how did you generate the heat map (mean & std) in the figure?

Doubiiu commented 1 year ago

Hi~ Thanks for your interest! 1) [Ref. main/render.py ]In fact, I render the frames with white background color (change L163 background_black as False), and then save pred_img in L141 with cv2.imwrite() for each frame. At last, write a function to save the image as .png files while checking the pixel value to make the white ones transparent (alpha channel). 2) It is a little bit complicated. The steps are i) calculate the adjacent-frame motion ii) normalize it iii) apply color map iv) render the mesh with that color for each vertex.

xjtupanda commented 1 year ago

Hi~ Thanks for your interest!

[Ref. main/render.py ]In fact, I render the frames with white background color (change L163 background_black as False), and then save pred_img in L141 with cv2.imwrite() for each frame. At last, write a function to save the image as .png files while checking the pixel value to make the white ones transparent (alpha channel).

It is a little bit complicated. The steps are i) calculate the adjacent-frame motion ii) normalize it iii) apply color map iv) render the mesh with that color for each vertex.

Thanks for your fast response to help! I'll dive into the details and try those. Thanks again.

Doubiiu commented 1 year ago

Not a problem~ Then I will close this issue and you can reopen it whenever you need help.

xjtupanda commented 1 year ago

@Doubiiu I'm trying to reproduce the quantitative results. Could you please offer the scripts or some guidance to calculate the metrics, i.e., the lip vertex error and the upper-face dynamics deviation (FDD)? I didn't find the script for calculating these two metrics. And I have no clue which vertex to include in the calculation, since I don't know the structure of the data.

Doubiiu commented 1 year ago

We have already provided that here.

xjtupanda commented 1 year ago

We have already provided that here.

My bad :). Didn't notice that. Have a good day~

xjtupanda commented 1 year ago

@Doubiiu

Sorry to bother you again. But I failed to reproduce the quantitative results reported in the paper, which are 4.7914e-4 and 4.1170e-5 for Lip Vertex Error and FDD respectively. I've tried to train from scratch by myself and used the demo checkpoint provided in the repo, output as below:

self-trained:
Frame Number: 3879
Lip Vertex Error: 5.8714e-04
FDD: 4.9687e-05

demo checkpoint:

Frame Number: 3879
Lip Vertex Error: 1.4029e-03
FDD: 5.0113e-05

So there is still quite a gap from the reported ones. Is there any missing trick, or parameter tuning or it's just because of the fluctuation of random seeds? If it's the last case I might have to run multiple times or have a hard time performing heavy parameter tuning to catch up with the SOTA performance. :(

Doubiiu commented 1 year ago

Hi. I am not sure if there may be some issues during your testing? I test the released metric calculation code and released weight checkpoint, the results should be exactly the same as what we have reported in the paper. You may need to tune the LR and test some intermediate checkpoints both quantitatively and quantitatively. (The quantitative evaluation metric may be just for reference and not that feasible to ensure good visual results) BTW, (I think the VQ-stuff is not that stable?), my model somehow crashes when I just change a few hyper-parameters.

xjtupanda commented 1 year ago

Hi. I am not sure if there may be some issues during your testing? I test the released metric calculation code and released weight checkpoint, the results should be exactly the same as what we have reported in the paper. You may need to tune the LR and test some intermediate checkpoints both quantitatively and quantitatively. (The quantitative evaluation metric may be just for reference and not that feasible to ensure good visual results) BTW, (I think the VQ-stuff is not that stable?), my model somehow crashes when I just change a few hyper-parameters.

Yeah, I might also need to check the qualitative results. But that's quite odd for the quantitative part. I used the biwi_stage2.pth.tar and followed the instructions exactly. Are you sure it's the same checkpoint as the demo one? If so, I might have to process the data from scratch to avoid data errors, which is quite a headache.

Doubiiu commented 1 year ago

I am sure about the checkpoint. You may check the data to ensure there is no problem. BTW, as long as the data is correctly processed (maybe some difference is allowed?), it is fair to test all methods including yours on the same processed dataset.

xjtupanda commented 1 year ago

I am sure about the checkpoint. You may check the data to ensure there is no problem. BTW, as long as the data is correctly processed (maybe some difference is allowed?), it is fair to test all methods including yours on the same processed dataset.

I followed the exact process in the markdown file in this repo. Maybe some step accidentally went wrong. I might as well process again. If the same circumstance happens, I would use the same dataset to compare against previous methods. Anyway, thanks so much for your help.

youngstu commented 1 year ago

I has the same problem. The metric of the pretrained model biwi_stage2.pth.tar is almost the same with paper. But i used the official code to retrain the second stage2 model, the metric is not align.

offical pretrained model:

Frame Number: 3879
Lip Vertex Error: 4.7987e-04
FDD: 4.1262e-05

offical code stage2 retrained:

Frame Number: 3879
Lip Vertex Error: 5.2776e-04
FDD: 4.4944e-05

youngstu commented 1 year ago

Hi. I am not sure if there may be some issues during your testing? I test the released metric calculation code and released weight checkpoint, the results should be exactly the same as what we have reported in the paper. You may need to tune the LR and test some intermediate checkpoints both quantitatively and quantitatively. (The quantitative evaluation metric may be just for reference and not that feasible to ensure good visual results) BTW, (I think the VQ-stuff is not that stable?), my model somehow crashes when I just change a few hyper-parameters.

@Doubiiu which epoch model for the biwi_stage2.pth.tar? epoch 100 or intermediate epoch? Could you provide random seed to facilitate reproduction? Thanks.

Doubiiu / CodeTalker

Some questions about reproduction #8