facebookresearch / AGRoL

Code release for "Avatars Grow Legs Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model", CVPR 2023
Other
238 stars 24 forks source link

Real Time Prediction #6

Closed StefanoBraghetto closed 11 months ago

StefanoBraghetto commented 1 year ago

Hello,

Great work! This has definitely pushed the boundaries of the state of the art. Congratulations!

I would like to ask how to run the model for real-time predictions. Should I create a small buffer (t-->0) with some point movements in 6D rotation and feed it to the model? Is it expected to use the non_overlapping_test or the overlapping_test method?

Thank you in advance for your response.

StefanoBraghetto commented 1 year ago

Just to provied more info.

My first attempt to use the model in real time prediction was to use the overlapping sample function with a sld_wind_size=1. This should create sparse_splits of 1 frame. But this is the result:

https://github.com/facebookresearch/AGRoL/assets/46006663/21a181ad-5626-4533-8b1f-069c0b875940

In the generated animation, there is a trembling effect that is more noticeable in the legs. My initial thought as to why this is happening was that the diffusion model starts its prediction from a randomly generated starting point. However, upon further analysis, even when setting the seed or switching to an MLP model, the result should be the same. This is because when adding a new frame, the model generates a new movement for the entire sequence, from which only the last frame is taken. If the previous text is not clear, I tried to create a diagram in the following drawing.

image

What I'm trying to say with all of this is that this model is not useful to real time prediction. Could you confirm this?

Thank you

yufu-liu commented 1 year ago

Hi, thanks for sharing! I also encountered the same issue for real-time prediction. The avatar's leg kept moving with small sliding window size like 1~10. Have you solved it yet?

Combo1 commented 1 year ago

Hi, in which part did you guys insert your HMD + controller data into the model? I try to replace the "hmd_position_global_full_gt_list" with my own HMD/controller data, but there is no documentation on this. I only know that in the first 18 positions the rotation, the following 18 rotational velocity, the next 9 the position and the last 9 the positional velocity are, but I still get very weird results. Can you share how you inserted your data into the AGRoL pipeline?

StefanoBraghetto commented 1 year ago

Hi @Combo1 ,

I wouldn't like to lose the main subject of this issue which is to understand if the model is suitable for Real Time Prediction. I'd love the authors of this paper to clarify this. But if you want, maybe you could share in another issue your script to understand if there is any problem. I did just what you described with the sparse input vector.

yufu-liu commented 1 year ago

@Combo1 I think there are many reasons may let you stock in weird results. (Perhaps we all stock in different problems.) I guess you insert your data at right place, but your data isn’t transformed correctly. @StefanoBraghetto As the project shows demo video, it should work. My recent problem is I can’t speed up my inference data to 60 Hz frame rate which means my avatar looks lagging. Moreover, when collecting data faster than 60Hz, it starts trembling which means the collected data is too sensitive to the model. I suggest Sefano examine your sampling rate and inference rate which should be 60Hz.

StefanoBraghetto commented 1 year ago

@yufu-liu thanks for your answer. In the implementation I did, the trembling stopped when I give the model a delay of prediction of at least 10 frames (this is collect 10 new frames and make another prediction).

Also I am using the same Dataset as in AGRoL so pretty sure I am using 60 Hz frame rate. The demo says Real Data, Not Real Time which is different. Are you doing Real time Prediction? Which is make a new prediction with just one new frame. If you did. Could you please more details of how are you doing this? Are you using a sld_wind_size=1?

Thank you again!

yufu-liu commented 1 year ago

@StefanoBraghetto I see. So you just switch the evaluation function from non_overlap to overlap and set the window size to 10, right? I haven’t try it yet, but I tried non_overlap version and get smooth animations.

So your understanding is that demo is filmed, the data is collected , and then the author used the collected data to generate the animation eventually? If that’s true, we should stop here, go back to avatar poser, and modify it! I only found the animation is weird when the man is running backwards.

Yes! I’m doing real-time inference and ideally want to get the last 196 frames every 16ms. Perhaps you can examine if the window step and window size are correct.

StefanoBraghetto commented 1 year ago

@yufu-liu thanks again

If you aim to process the 196 frames in 16ms your framerate would be: 196 / 0,016 = 12,250 (twelve thousand two hundred and fifty frames per second) which is much more faster to what the model was trained for. As you can see here:

image

which means the input data to the model wont almost move, like every frame would be almost the same. I am misunderstanding you?

thank you again

yufu-liu commented 1 year ago

@StefanoBraghetto Yeah, there is a little misunderstanding.

Here is my thought: 613BB9D5-B126-40D1-B7B7-401EC0AEAB05

So the window size is 196 frames, overlapping 195 frames, and sliding step is 1. However, it is too hard to implement like this in a smooth way. The overlapping size might be smaller. For example, the paper suggests 70 which means it still move 126 frames forward after finishing every inference. It is also necessary to adjust visualization part when you adjusted the window size.

Stefano-retinize commented 1 year ago

Thank you again,

Then the model is not suitable for online prediction? I Can't see how to solve the trembling since the new frame comes from a new generated motion which is slightly different. I think to avoid the trembling the model should be also receiving the lasts frames prediction as context, so it could fit the new prediction with the past motion.

Combo1 commented 1 year ago

Thanks for your replies! I checked my transformations again and indeed found the error, which caused my avatar to behave weird and now it looks acceptable. For real-time inference, I have a similar doubt that AGRoL and AvatarPoser might be recorded and then predicted not in real-time. When I try to infer a four second clip it takes my machine over a minute, but this might be due to my hardware restrictions.

StefanoBraghetto commented 1 year ago

@Combo1 , that's weird. For a really average computer (without even a GPU) it takes less than 0.1 second in do the inference of a clip of 196 frames at 60 Hz. Which is then about 3 seconds clip. You might enhance that time.

yufu-liu commented 1 year ago

Hi, nice to hear that! But I have the same concern with the speed. Can you share your sliding window size and window step?

Lastly, do you find any trembling effect or jitter in real time inference?

Combo1 commented 1 year ago

@StefanoBraghetto This is very surprising to hear. My machine is equipped with a NVIDIA Geforce RTX 2060 Super, but I will look, if I can speed this process up. However, I was referring to AvatarPoser not AGRoL, my bad. However, since AGRoL seems to work based on a large amount of code from AvatarPoser I figured I could search here to find more information about it.

`python main_test_avatarposer.py export CUDA_VISIBLE_DEVICES=0 number of GPUs is: 1 LogHandlers setup! -------------------------------number of test data is 329 Dataset [AMASS_Dataset - test_dataset] is created. Initialization method [kaiming_normal + uniform], gain is [0.20] Training model [ModelAvatarPoser] is created. Loading model for G [model_zoo/avatarposer.pth] ... 23-07-17 14:21:53.854 : testing the sample 0/329 None

results/AvatarPoseEstimation/videos/0/None.avi 23-07-17 14:35:23.771 : testing the sample 1/329 None results/AvatarPoseEstimation/videos/1/None.avi 23-07-17 14:36:34.273 : testing the sample 2/329 None`

I modified my code a bit, but now that you mention it I guess most of the time is due to drawing the video clip.

@yufu-liu In case, you meant me, I will start working with AGRoL now, since AvatarPoser seems to work now. If I am able to generate motions without any trembling effect, I will share my results with you.

cccvision commented 1 year ago

I found similar issues, did you manage to get smooth results with overlapping test? Besides, is it possible to run this model in real-time?

yufu-liu commented 1 year ago

Hi, I would like to share some ideas.

Maybe this is a real issue in this algorithm, but some actions still can be done. For example, we can modify the model or post-process the predicted rotation and orientation. Moreover, according to the things we found, window size and trembling level is trade-off, so maybe we can adjust window size to acceptable latency like 10-30 in real-time cases.

asanakoy commented 11 months ago

clsoing

gb2111 commented 7 months ago

@StefanoBraghetto This is very surprising to hear. My machine is equipped with a NVIDIA Geforce RTX 2060 Super, but I will look, if I can speed this process up. However, I was referring to AvatarPoser not AGRoL, my bad. However, since AGRoL seems to work based on a large amount of code from AvatarPoser I figured I could search here to find more information about it.

`python main_test_avatarposer.py export CUDA_VISIBLE_DEVICES=0 number of GPUs is: 1 LogHandlers setup! -------------------------------number of test data is 329 Dataset [AMASS_Dataset - test_dataset] is created. Initialization method [kaiming_normal + uniform], gain is [0.20] Training model [ModelAvatarPoser] is created. Loading model for G [model_zoo/avatarposer.pth] ... 23-07-17 14:21:53.854 : testing the sample 0/329 None

results/AvatarPoseEstimation/videos/0/None.avi 23-07-17 14:35:23.771 : testing the sample 1/329 None results/AvatarPoseEstimation/videos/1/None.avi 23-07-17 14:36:34.273 : testing the sample 2/329 None`

I modified my code a bit, but now that you mention it I guess most of the time is due to drawing the video clip.

@yufu-liu In case, you meant me, I will start working with AGRoL now, since AvatarPoser seems to work now. If I am able to generate motions without any trembling effect, I will share my results with you.

@Combo1 Do you mind to share this with me as well? Thanks.