JeremyCJM / DiffSHEG

[CVPR'24] DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
https://jeremycjm.github.io/proj/DiffSHEG/
BSD 3-Clause "New" or "Revised" License
112 stars 9 forks source link

Doubts #21

Closed prinshul closed 1 month ago

prinshul commented 1 month ago

Hi @JeremyCJM

Thanks you for your support.

I have few queries and concerns regarding the code and if you could resolve it, that would be great.

  1. To give context, I am finetuning the model on speaker number 13 with a small amount of data using the pretrained model you've provided. While finetuning it is also performing validation and the FGD that I am getting is around 350 Million, which is not believable. So, can you let me know if this type of behavior is expected or there might be something happening ?

  2. I also wanted to test the pretrained model which you've provided in the repo, but while testing I am facing an OS error stating that these files are not found i.e. the test_ddpm_cjm_GesExpr.py and the test_ddpm_cjm.py. I did check the BEAT repo but I could not find these files. It would be best if you could point the locations of these files or send these files via mail.

  3. In addition, if you could also send the script for preparing the lmdb cache for the speech files, that would be great, so that I can compare my script and if there's anything wrongly processed !

JeremyCJM commented 1 month ago

Hi Prinshul,

  1. To give context, I am finetuning the model on speaker number 13 with a small amount of data using the pretrained model you've provided. While finetuning it is also performing validation and the FGD that I am getting is around 350 Million, which is not believable. So, can you let me know if this type of behavior is expected or there might be something happening ?

This is because I only trained the autoencoder on subject 2, 4, 6, 8, following the setting of CaMN in the BEAT paper.

  1. I also wanted to test the pretrained model which you've provided in the repo, but while testing I am facing an OS error stating that these files are not found i.e. the test_ddpm_cjm_GesExpr.py and the test_ddpm_cjm.py. I did check the BEAT repo but I could not find these files. It would be best if you could point the locations of these files or send these files via mail.

test_ddpm_cjm_GesExpr.py and test_ddpm_cjm.py are adapted from this file in BEAT GitHub.

  1. In addition, if you could also send the script for preparing the lmdb cache for the speech files, that would be great, so that I can compare my script and if there's anything wrongly processed!

For the speech feature preparation, you can directly refer to the code here: https://github.com/JeremyCJM/DiffSHEG/blob/3ebf3058f48cba3da9146afb7623e9ec1ab9e9a5/trainers/ddpm_beat_trainer.py#L1430

prinshul commented 1 month ago

This is because I only trained the autoencoder on subject 2, 4, 6, 8, following the setting of CaMN in the BEAT paper.

But I can further train the autoencoder on a new speaker like 13?

JeremyCJM commented 1 month ago

Yes, of course. You can comment the evaluation code lines out for training the motion generation model only. If you care about the Frechet Distance, you can first train/finetune a new autoencoder of/including your additional training data, and then use the trained autoencoder for evaluation.

xungeer29 commented 1 month ago

Hi Prinshul,

  1. To give context, I am finetuning the model on speaker number 13 with a small amount of data using the pretrained model you've provided. While finetuning it is also performing validation and the FGD that I am getting is around 350 Million, which is not believable. So, can you let me know if this type of behavior is expected or there might be something happening ?

This is because I only trained the autoencoder on subject 2, 4, 6, 8, following the setting of CaMN in the BEAT paper.

  1. I also wanted to test the pretrained model which you've provided in the repo, but while testing I am facing an OS error stating that these files are not found i.e. the test_ddpm_cjm_GesExpr.py and the test_ddpm_cjm.py. I did check the BEAT repo but I could not find these files. It would be best if you could point the locations of these files or send these files via mail.

test_ddpm_cjm_GesExpr.py and test_ddpm_cjm.py are adapted from this file in BEAT GitHub.

  1. In addition, if you could also send the script for preparing the lmdb cache for the speech files, that would be great, so that I can compare my script and if there's anything wrongly processed!

For the speech feature preparation, you can directly refer to the code here:

https://github.com/JeremyCJM/DiffSHEG/blob/3ebf3058f48cba3da9146afb7623e9ec1ab9e9a5/trainers/ddpm_beat_trainer.py#L1430

How to adapt https://github.com/PantoMatrix/PantoMatrix/blob/main/scripts/BEAT_2022/test.py to test_face_ddpm_cjm.py, test_ddpm_cjm_GesExpr.py and test_ddpm_cjm.py.