Closed askerlee closed 4 months ago
Could you please confirm if the inference works correctly when using the recommended RealisticVision version?
I just tried the recommended version and it's the same. BTW I used the latest diffusers 0.29.2 and pytorch 1.13. But I guess that won't cause nan....
When using the environment we provided (environment.yaml
), have you encountered similar issues?
I did some simple debugging. The values in latents_group (pipeline.py:L757) seems to increase quickly and explode eventually. After one iteration, latents_group.abs().max() is 40, then 41, 42, ... I tried to address this issue by setting grad_guidance_threshold to 0.1, however just got a messy video: https://github.com/Bujiazi/MotionClone/assets/1575461/728f1f01-ff10-448f-917f-112ca7414329
Now I change grad_guidance_threshold to 1, but it seems it's still going to explode (just a bit slower than not setting grad_guidance_threshold) tensor(48.9375, device='cuda:0', dtype=torch.float16) 15%|██████████▏ | 77/500 [02:50<15:39, 2.22s/it]tensor(49.4375, device='cuda:0', dtype=torch.float16) 16%|██████████▎ | 78/500 [02:52<15:37, 2.22s/it]tensor(49.8750, device='cuda:0', dtype=torch.float16) 16%|██████████▍ | 79/500 [02:54<15:36, 2.22s/it]tensor(50.3125, device='cuda:0', dtype=torch.float16) 16%|██████████▌ | 80/500 [02:57<15:35, 2.23s/it]tensor(50.5000, device='cuda:0', dtype=torch.float16) 16%|██████████▋ | 81/500 [02:59<15:32, 2.23s/it]tensor(51.2812, device='cuda:0', dtype=torch.float16) 16%|██████████▊ | 82/500 [03:01<15:31, 2.23s/it]tensor(51.4375, device='cuda:0', dtype=torch.float16) 17%|██████████▉ | 83/500 [03:03<15:28, 2.23s/it]tensor(51.8750, device='cuda:0', dtype=torch.float16) 17%|███████████ | 84/500 [03:06<15:27, 2.23s/it]tensor(53.2188, device='cuda:0', dtype=torch.float16) 17%|███████████▏ | 85/500 [03:08<15:22, 2.22s/it]tensor(52.9062, device='cuda:0', dtype=torch.float16) 17%|███████████▎ | 86/500 [03:10<15:19, 2.22s/it]tensor(53.6562, device='cuda:0', dtype=torch.float16) 17%|███████████▍ | 87/500 [03:12<15:17, 2.22s/it]tensor(55.1562, device='cuda:0', dtype=torch.float16) 18%|███████████▌ | 88/500 [03:14<15:15, 2.22s/it]tensor(55.6562, device='cuda:0', dtype=torch.float16) 18%|███████████▋ | 89/500 [03:17<15:10, 2.22s/it]tensor(55.9062, device='cuda:0', dtype=torch.float16)
EDIT: after 500 iterations the max value is 464 and I got another messy video: https://github.com/Bujiazi/MotionClone/assets/1575461/09b756d5-8a73-41fd-a1fa-4bb0c22cf185
Thanks for the feedback. Due to discrepancies between the versions of Diffusers and Torch in your environment and the versions we recommend, you may encounter some unexpected issues 😂. We strongly recommend using the environment we have specified:
conda env create -f environment.yaml
Thanks. Yeah I just used the recommended environment and seems the tensor values are normal. Would update once it's finished.
Nice, eventually got the right video! https://github.com/Bujiazi/MotionClone/assets/1575461/9c96e7c9-fcf2-4538-b227-6b8206ec5aae
It's a bit dark but I'm happy it works. Any method to make it brighter? Maybe add "bright lighting" to the prompt?
BTW the first 300 iterations are pretty slow (2.22s/it on A6000), but at the last 200 iters it's 1.47it/s (3.26x of the first 300 iters). Seems the prompt conditioning takes a lot of time. Have you tried to apply prompt conditioning once every N iterations to speed things up?
It is great to see that you have successfully run MotionClone 😄. Feel free to try various prompts. When we ran the astronaut example, we successfully obtained bright result like this:
Guidance on every step of the first 300 steps. Took 802s. https://github.com/Bujiazi/MotionClone/assets/1575461/5533e38b-3c52-4cad-a836-e7d800bfe3a9
Guidance on 1 out of every 3 steps of the first 300 steps. Took 493s (40% speed up). https://github.com/Bujiazi/MotionClone/assets/1575461/1c940837-78ed-4a1d-a5d4-55cffe799969
The cross-frame consistency looks worse. Despite that, it looks ok.
Guidance once every 2 steps of the first 300 steps (skip half). Now it looks as good as without skipping, but only takes 572s (30% speed up) https://github.com/Bujiazi/MotionClone/assets/1575461/4d6ea298-b792-4bcc-ab77-51f012f9d5af
Thank you very much for your exploration 🌹, the results are indeed impressive 😘. We also attempted a similar skipping mechanism in the early stages of our experiments, but found that it was not very stable and had a probability of failing in certain cases, maybe the steps we skipped were too large. We will consider incorporating a stable skipping mechanism in future optimized versions to accelerate inference.
Glad that it helps! Will go back to integrate my component 😄
Glad that it helps! Will go back to integrate my component 😄
We have updated the code. Now MotionClone is able to 1) directly performs motion customization without cumbersome video inversion ; 2) significantly reduces memory consumption. In our experiments, For 16×512×512 text-to-video, the memory consumption is about 14GB . For MotionClone combined with image-to-video or sketch-to-video, the memory is about 22 GB. Hope this helps.
I tried to do inference with the following command:
python3 sample.py --config configs/inference_config/astronaut.yaml
(The only thing I changed is used a revised version of RealisticVision, and fixed some mismatching ckpt key names)It produces a video tensor in which all elements are nan:
I've checked and made sure the inversion file
inversion/inverted_data_astronaut.pkl
doesn't contain nan values (the 'all_latents_inversion' and 'inversion_prompt_embeds' tensors look normal).Any thought why this might happen? Thanks.