Open iloveOREO opened 1 month ago
Thanks for your feedback @iloveOREO
We update the main branch to fix a small bug, and the last frame no longer has pursed lips.
We will update more details about this phenomenon tomorrow for this issue.
Thanks for your feedback. @iloveOREO
If you want the source video and driving video to be the same video, and the animated video to be as similar as possible to the source video, you can use: python inference.py --no_flag_relative_motion --no_flag_do_crop
. In this way, you can achieve the following result:
https://github.com/user-attachments/assets/c1e73ee1-e151-41f8-833f-ffcbb2fa3ef8
Here, we are not using relative driving, but absolute driving. The difference between the two is that --flag_relative_motion
means that the motion offset of the current driving frame relative to the first driving frame will be added to the motion of the source frame as the final driving motion, while --no_flag_relative_motion
means that the motion of the current driving frame will be directly used as the final driving motion.
If you use the default --flag_relative_motion
, then when the source frame is a smile, and the driving frame has an expression deformation relative to the first driving frame, the expression of the animated frame will be a smile added to the smile, so the expression will be amplified. The animated video in this setting is as follows:
https://github.com/user-attachments/assets/7d36f137-cca5-4945-935b-242f240e3f56
Thanks for your feedback. @iloveOREO
If you want the source video and driving video to be the same video, and the animated video to be as similar as possible to the source video, you can use:
python inference.py --no_flag_relative_motion --no_flag_do_crop
. In this way, you can achieve the following result:d0--d0_concat_non_relative.mp4 Here, we are not using relative driving, but absolute driving. The difference between the two is that
--flag_relative_motion
means that the motion offset of the current driving frame relative to the first driving frame will be added to the motion of the source frame as the final driving motion, while--no_flag_relative_motion
means that the motion of the current driving frame will be directly used as the final driving motion.If you use the default
--flag_relative_motion
, then when the source frame is a smile, and the driving frame has an expression deformation relative to the first driving frame, the expression of the animated frame will be a smile added to the smile, so the expression will be amplified. The animated video in this setting is as follows:d0--d0_concat_relative.mp4
Thank you for your reply. 'Absolute driving' performs well in this case.
However, I also tried generating with different videos/IDs and found that there is always some jitter when using --no_flag_relative_motion
and no relative head rotation(v2v)
.
https://github.com/user-attachments/assets/b91c2c69-70bb-443f-bfb5-f1b0b0f1ac1e
Initially, I thought this was caused by t_new = x_d_i_info['t']
, so I tried changing it tot_new = x_s_info['t']
(sinceR_new = R_s
, shouldn't this be the case?), but the results didn't change significantly. Finally, I tried setting t_new = torch.zeros(x_d_i_info['t'].size()).to(device)
, and found no visible difference in the generated results. So, is the main source of head jitter fromx_d_i_info['exp']
?
https://github.com/user-attachments/assets/a53271f8-9eac-44a0-bb3d-764291562a2b
How can real 'absolute driving' be achieved, where only the expression is edited and the original head movement is retained?
Additionally, I noticed that the paper specifically mentionedNote that the transformation differs from the scale orthographic projection, which is formulated as x = s · (x_c + δ)R + t.
, Could the current representation be causing instability in the generated results under the driving video due to the inability to fully decouple exp
from `R
?
Using v2v for expression driving, it was observed that under the same video, the results showed 'exaggerated expressions' (the mouth opens wider or closes less). Shouldn't it be exactly the same as the driving video?
https://github.com/user-attachments/assets/4366169f-fd45-4a69-ab2e-d81a93ee55d3