Open Shedima opened 2 years ago
You can find the method generate_gestures_by_dataset
in processor_v2.py, which provides the generation of the GENEA dataset in addition to TED. The quantitative metrics for GENEA were evaluated by hand following the code inside generate_gestures
in processor_v2.py.
Do you mean to use generate_gestures_by_dataset to generate a sequence of poses and then evaluate quantitative metrics manually with generate_gestures?
Yes
Hello, I followed the previously stated method to evaluate on the GENEA dataset and found that the FGD evaluation metrics are very high. According to the method in the source code, I generated the corresponding video and found that the generated one is completely different from the real pose. I guess this is the reason for the high FGD evaluation metrics. So can you please provide a complete methodology to evaluate it on the GENEA dataset.
One thing I notice is that the arms in the GT seem to be vertically inverted (along the y-axis). I think that the evaluation is adding some vertical flipping for both the GT and the predicted, but it might not be required for the GT. Could you try to evaluate and visualize by inverting back the y-axis values of the GT? Apart from that, we had used the same error terms to evaluate on GENEA as on the TED dataset.
I also found that the arm in GT is inverted vertically, I tried to reverse the value of the Y axis but found that the FGD is still very high. Although I am following the evaluation and visualization in the source code you provided. I can't get the evaluation data in your paper on GENEA dataset, so can you provide the source code that you evaluated on GENEA dataset?
My apologies, but I currently don't have access to that evaluation code, not sure how soon I will be able to retrieve and access it. Meanwhile, you can report the higher numbers if you followed the same evaluation methodology as for the TED dataset.
How would you evaluate the quantitative performance of your model on the genea_challenge_2020 dataset? I only found the code for evaluation on the TED dataset.