JeremyCJM / DiffSHEG

[CVPR'24] DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
https://jeremycjm.github.io/proj/DiffSHEG/
BSD 3-Clause "New" or "Revised" License
99 stars 6 forks source link

The test datasets used for evaluation metrics #19

Open lovemino opened 2 weeks ago

lovemino commented 2 weeks ago

Thank you very much for your contribution and for sharing it. I have always been curious about the evaluation metrics for co-speech, and I would like to ask whether the test datasets used for the metrics in your paper are the same as the ones used for Camn. I noticed that the test datasets in your code are somewhat different from Camn in terms of LMDB loading. If you could spare some time to answer this, I would be very grateful.

JeremyCJM commented 2 weeks ago

Hi lovemino, the test set should be the same as CaMN except that I converted the Euler rotations into axis-angle format. Therefore, the autoencoders to compute Fréchet Distances are also retrained for axis-angle rotations. Note that the Frechet Distance code of CaMN did not turn on evaluation mode, while the results in our paper are after correcting this issue.

lovemino commented 2 weeks ago

Hi lovemino, the test set should be the same as CaMN except that I converted the Euler rotations into axis-angle format. Therefore, the autoencoders to compute Fréchet Distances are also retrained for axis-angle rotations. Note that the Frechet Distance code of CaMN did not turn on evaluation mode, while the results in our paper are after correcting this issue.

Thank you very much for your reply. I found your paper to be meticulously written and it is certainly a valuable read. However, as I attempted to reproduce the experimental results, I encountered some difficulties. Specifically, I found that the LMDB used in CAMN caused errors when used in your project.

Would it be possible for you to provide the processed LMDB file for the test datasets? I would greatly appreciate it.

Thank you for your contribution and response.

JeremyCJM commented 2 weeks ago

Hi, this is because I am using the latest version of lmdb. You can try replacing the lmdb in BEAT with latest version to generate the data cache. Regarding releasing the processed test dataset, I would need to check the license of the dataset. Let me know if upgrading lmdb helps :)

lovemino commented 2 weeks ago

Hi, this is because I am using the latest version of lmdb. You can try replacing the lmdb in BEAT with latest version to generate the data cache. Regarding releasing the processed test dataset, I would need to check the license of the dataset. Let me know if upgrading lmdb helps :) am very grateful for your prompt reply and look forward to you making the LMDB public. Once again, I want to express my appreciation for your contribution and I look forward to your future papers.Thank you very much.

lovemino commented 1 week ago

Based on your response, I used the build_cache function from your beat.py and the test set from CAMN to generate a new data.mdb file. Then, using the ges_axis_angle_300 weights and the CAMN test code, I calculated the axis_angle gestures of the 141 upper body joints (before converting to a matrix) during model inference. However, The resulting FDG was 4282.439, which is 10 times higher than the FDG: 438.93reported in your paper. I would like to ask if you could kindly provide the code and files you used to calculate these metrics. This would help me accurately reproduce the metrics reported in your paper.

Hi, this is because I am using the latest version of lmdb. You can try replacing the lmdb in BEAT with latest version to generate the data cache. Regarding releasing the processed test dataset, I would need to check the license of the dataset. Let me know if upgrading lmdb helps :)

JeremyCJM commented 1 week ago

Hi lovemino, before sharing the code, here are some traps you can check:

lovemino commented 1 week ago

Hi lovemino, before sharing the code, here are some traps you can check:

  • Have you turned on the eval mode?
  • Have you normalized the motion before feeding them into AutoEncoder of FGD?
  • What is your FGD of CaMN with our autoencoder checkpoint (convert to axis-angle)? Is it also very large?

Hello, Following your suggestion, I tested CaMN using your ges_axis_angle_300.bin. I converted the results and dataset of CaMN from Euler Angles to Axis-Angle using the conversion scripts you provided. Additionally, I used the mean and std.npy files and your ges_axis_angle_300.bin, but the resulting FGD is 800.22, which is different from the 1635.44 reported in your paper. This issue has troubled me for a long time. If you could kindly provide your test dataset via email, I would be immensely grateful. I appreciate your work and your response. Here is my Google email: lbj1040702929@gmail.com! Thank you.

JeremyCJM commented 1 week ago

Hi liubeibei, here is the link for the processed test set of BEAT. Please comply with the original license and restrictions of the BEAT dataset. Cheers!

lovemino commented 1 week ago

Hi liubeibei, here is the link for the processed test set of BEAT. Please comply with the original license and restrictions of the BEAT dataset. Cheers!

I am truly grateful for your willingness to provide the test set; you are a real lifesaver. If possible, could you also provide the relevant code for testing the FGD metric? Following the evaluation metrics written in your train function results in an error in motion_autoencoder because CaMN uses a sliding window to generate multiple latents. According to your method, it only gets one latent with shape (1, 34, 192) for batch size 1, whereas CaMN gets around 83 latents through sliding window, resulting in a shape of (83, 34, 192). yours: latent_out = self.eval_model(outputs[:, :34, :].float()) latent_ori = self.eval_model(motions[:, :34, :].float())

CaMN:

         for j in range(num_divs):
                 if j == 0:
                     cat_results = myoutputs[:,j*stride:j*stride+pose_length, :]#[83, 34, 141]
                     cat_targets = tar_pose2[:,j*stride:j*stride+pose_length, :]
                 else:
                     cat_results = torch.cat([cat_results, myoutputs[:,j*stride:j*stride+pose_length, :]], 0)
                    cat_targets = torch.cat([cat_targets, tar_pose2[:,j*stride:j*stride+pose_length, :]], 0)
             latent_out = self.eval_model(cat_results.float())
             latent_ori = self.eval_model(cat_targets.float())

One guess is that according to the data.mdb you provided, the shape of the pose/pose_axis_angle variable in the test_dataset I read is [855, 141], not [256, 34, 141] as you commented. So, whether I follow your method in the train function or use a sliding window like CaMN, I still don't get the correct metric. If you could provide the testing code, I would be incredibly grateful. You are truly a kind person, an angel!Here is my Google email: lbj1040702929@gmail.com ! Thank you.

lovemino commented 6 days ago

Hello, I have a simple question. Did you use the Beat 2, 4, 6, 8 dataset but with a different processing method, so you retrained the autoencoder? When comparing with CaMN, did you convert the results from the CaMN model to Axis-Angle to calculate the metrics, or did you retrain the entire CaMN model using your processed dataset? Thank you very much! I would be very grateful if you could answer.