Closed RohollahHS closed 2 months ago
Hello, we had similar results that if we set if_categorical=False
, the FID score is better. However, for text-to-motion generation, the diversity of generated motion is also important. When if_categorical=False
during sampling, the "MModality" metric will be zero. ("MModality" measures the diversity of human motion generated from the same text description). Therefore, I think it's better to report the results of if_categorical=True
Hello, we had similar results that if we set
if_categorical=False
, the FID score is better. However, for text-to-motion generation, the diversity of generated motion is also important. Whenif_categorical=False
during sampling, the "MModality" metric will be zero. ("MModality" measures the diversity of human motion generated from the same text description). Therefore, I think it's better to report the results ofif_categorical=True
Thanks for the great explanation. The MMM (https://github.com/exitudio/MMM/tree/main), which is a masked-based generative model, achieves better FID scores by using random sampling. It uses Gumbel sampling with temperature=1
, which I think is similar to Categorical()
sampling without top_k.
I think it might be better to report the FID score alone without random sampling, and for reporting diversity and other metrics, use random sampling.
For example, the MMM model has an FID score of about 0.12
on the test set without random sampling (Gumbel-softmax with temperature=0), but by using random sampling (Gumbel-softmax with temperature=1), its FID score drops to about 0.08
. On the other hand, my model works extremely well compared to MMM during training, and during sampling, my model also has an FID of 0.08
on the test set without random sampling. However, when I use random sampling (Gumbel with temperature=1), its FID score significantly worsens and reaches about 0.5
. I also tried top_k
and Categorical
sampling, and in all cases, my model gets worse FID scores compared to not using random sampling.
I'm not so sure about your cases on the huge FID score diff for sampling. I think you can try printing the final probability before sampling to check whether the index with maximum prob is much higher than other probability for most of the time:
I'm not so sure about your cases on the huge FID score diff for sampling. I think you can try printing the final probability before sampling to check whether the index with maximum prob is much higher than other probability for most of the time:
- If not, it may be because of some bugs in random sampling part (because without random sampling, the FID score is much better, it seems the max prob is close to others so that it easily samples some other index).
- If the maximum probability is already much higher to others, then maybe the model is sensitive to noise. I guess this is because of the discrepancy between training and inference as you are also using an auto-regressive model, maybe some corrupting training strategy would help (eg. during training, replacing some token with random ones).
Thanks for your great suggestions, especially the second one. I will try that.
Hi, thanks for the great work.
I developed an autoregressive model that is somewhat similar to T2M-GPT. However, during sampling, I get better results when
if_categorical=False
compared toif_categorical=True
, both on validation and test datasets. Do I have to useif_categorical=True
if I want to report my FID score in my paper?https://github.com/Mael-zys/T2M-GPT/blob/7db71a28b2117abd9fc0dd402b91df72f1bc6ace/models/t2m_trans.py#L33
Thanks