EricGuo5513 / momask-codes

Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"
https://ericguo5513.github.io/momask/
MIT License
690 stars 56 forks source link

temporal inpainting problems #31

Open tucker666 opened 3 months ago

tucker666 commented 3 months ago

thank you for your great work. here is my problem: when i use in-between editing demo like python edit_t2m.py --gpu_id 1 --ext exp6 --use_res_model -msec 80,120 --text_prompt "A man is playing football." python edit_t2m.py --gpu_id 1 --ext exp7 --use_res_model -msec 80,120 --text_prompt "A man is dancing." python edit_t2m.py --gpu_id 1 --ext exp8 --use_res_model -msec 80,120 --text_prompt "A man is running." python edit_t2m.py --gpu_id 1 --ext exp9 --use_res_model -msec 80,120 --text_prompt "A man is flying." python edit_t2m.py --gpu_id 1 --ext exp10 --use_res_model -msec 80,120 --text_prompt "A man is backfliping." I got 5 bvh results, but they are different even not in 80-120 frames here is the start frame and end frame

image image

i wonder how can i get same results in no-editting time interval.

Murrol commented 3 months ago

Hi, thanks for your interest.

To get the exactly same motion for the un-edited parts, you might disable the res_trans in #Line153, or use the token[..., 1:] instead of predicted residual tokens.

From my view, it may not be reasonable to make it end with the exactly same frame. E.g. editing "Go straight ahead" vs "Walk backward". Let me know if I misunderstand your problem.

Shawneeee commented 1 month ago

Hi, thanks for your interest.

To get the exactly same motion for the un-edited parts, you might disable the res_trans in #Line153, or use the token[..., 1:] instead of predicted residual tokens.

From my view, it may not be reasonable to make it end with the exactly same frame. E.g. editing "Go straight ahead" vs "Walk backward". Let me know if I misunderstand your problem.

Hello, thank you for your answer. But I have the same problem. my source_motion input's length is 80 frames. I want to fix the action of 0-8 frames. I entered the following command, and disable the res_trans. but i cant get the same results in no-editting time interval(0-8frame) between input and the generated motion. python edit_t2m.py --gpu_id 1 --ext exp -msec 9,80 --text_prompt "A man is walking."

Murrol commented 1 month ago

Hi,

You can check the tokens before and after editing to see if the token for the specific action period is fixed. The decoder might introduce variance to the fixed pose frames, especially when the fixed clip is short. This is because the decoder has a relatively larger reception field. So, small differences should be reasonable in our design. If you still have any concerns, please provide the distance error measure or visual comparison so I can better check if it's a bug or a difficult case.