In Fig. 7 of BAMM paper, the blue meshes are corresponding to the blue texts, and the red meshes to the red texts.
In Table 5 of the paper "Table 5: Evaluation on temporal editing tasks on HumanML3D dataset", did you evaluate your model like the Fig. 7?
I mean did you use the same text prompt for unmasked tokens to predict masked tokens? Or, you randomly used different text prompts for masked tokens and unmasked tokens (Similar to Fig. 7)?
Based on the contents in Section "4.2 Length Prediction and Editablity", I think you used the same text prompts for unmasked and masked tokens for evaluation, but different text prompts for generation. Am I correct?
If I want to generate some motions for temporal editing tasks, do I have to use different text prompts for unmasked tokens and masked tokens (like Fig. 7)?
Evaluation (Table 5): we used same text prompt for evaluation since there is ground truth to evaluate the different text prompts.
Generation (Fig. 7): we generate twice. First generate the whole sequence based on blue text. Then mask out the red parts. And generate the masked parts with red text. Use can follow the example of MMM "In-between" section.
Greate work. Congrats.
In Fig. 7 of BAMM paper, the blue meshes are corresponding to the blue texts, and the red meshes to the red texts.
In Table 5 of the paper "Table 5: Evaluation on temporal editing tasks on HumanML3D dataset", did you evaluate your model like the Fig. 7? I mean did you use the same text prompt for unmasked tokens to predict masked tokens? Or, you randomly used different text prompts for masked tokens and unmasked tokens (Similar to Fig. 7)?
Based on the contents in Section "4.2 Length Prediction and Editablity", I think you used the same text prompts for unmasked and masked tokens for evaluation, but different text prompts for generation. Am I correct?
If I want to generate some motions for temporal editing tasks, do I have to use different text prompts for unmasked tokens and masked tokens (like Fig. 7)?
Thanks