Open sbersier opened 1 year ago
Please guide me on how I can control eye blinking. Thanks
@parthagorai
The expression_scale
factor includes mouth and eyes. And I don't think it is possible to separate them this way.
But you can always use a reference video for the eye blinks.
For example:
python inference.py --ref_eyeblink eyeblink.mp4 --driven_audio audio.wav --source_image image.png --size 256 --enhancer gfpgan --pose_style 2 --expression_scale 1.5 --head_motion_scale 0.5
Where eyeblink.mp4
would point to a video of a real person blinking eyes. But I'm not sure if it keeps the eye lids motion (including the "scale") or if it just detects that the person has blinked at that particular moment. Could be worth to try.
Thanks for your reply @sbersier.
I'd be happy if you could provide guidance on achieving more natural head motion and expressions, similar to what other AI tools like Heygen AI offer. I am using preprocess : full
@parthagorai Well, you can always use a video to drive the head motion. Something like:
python inference.py --ref_pose reference_pose.mp4 --ref_eyeblink eyeblink.mp4 --driven_audio audio.wav --source_image image.png --size 256 --enhancer gfpgan --pose_style 2 --expression_scale 1.5 --head_motion_scale 0.5
With reference_pose.mp4
a video (it may be better if the person is not talking). SadTalker will then copy the head motion from reference_pos.mp4
and copy the eye blinks from eye_blink.mp4
If you combine this with expression_scale
and head_motion_scale
factors, I think the result will be as good as SadTalker can be...
Thank you so much for this tutorial and for making it easy to understand. I was wondering how I can get the teeth visibility and other features discussed here, and now it finally works.
Hi, SadTalker is really nice. Good job! Thanks!
I'm currently experimenting with it and noticed that expression_scale doesn't disentangle head motion from lips motion.
So, I modified the code (see below) in order to be able to control "head motion amplitude" and "mouth motion amplitude" independently.
For example, it allows me to generate a video with:
python inference.py --driven_audio audio.wav --source_image image.png --size 256 --enhancer gfpgan --pose_style 2 --expression_scale 1.5 --head_motion_scale 0.5
This will amplify mouth motions by 1.5 while scaling down head motions by a factor 0.5
Note: if you just specify expression_scale then head_motion_scale is 1.0 by default (just like usual)
Here is an example:
https://github.com/OpenTalker/SadTalker/assets/34165937/b83ca0e7-2249-4b57-b0ca-79688834b868
The above example was generated with:
python inference.py --driven_audio audio.wav --source_image image.png --size 256 --enhancer gfpgan --pose_style 2 --expression_scale 1 --head_motion_scale 1
(on the left) andpython inference.py --driven_audio audio.wav --source_image image.png --size 256 --enhancer gfpgan --pose_style 2 --expression_scale 2 --head_motion_scale 0.5
(on the right)If you want to explore a bit, you can also try setting
expression_scale
to 0.0 while settinghead_motion_scale
to 2.0 Or you can try settingexpression_scale
to 2.0 whilesetting head_motion_scale
to 0.0I found it quiet fun and quiet informative to play with it. For example: Setting
head_motion_scale
to 0.0 andexpression_scale
to 1.0, we see that when the character blinks, the head moves a bit. It looks like head motion and eyes blinks are not as disentangled as I would expect.What do you think?
Best regards, SB
The modifications to the code:
NOTE: Before modifying the code, I would recommend to make a copy of SadTalker/inference.py and SadTalker/src/generate_facerender_batch.py and put these copies in a safe place. So that you can always revert back.
A) In SadTalker/inference.py
On line 85:
Replace:
expression_scale=args.expression_scale, still_mode=args.still, preprocess=args.preprocess, size=args.size)
With the following:
expression_scale=args.expression_scale, head_motion_scale=args.head_motion_scale, still_mode=args.still, preprocess=args.preprocess)
On line:109
Replace:
parser.add_argument("--expression_scale", type=float, default=1., help="the batch size of facerender")
With the two following lines:
B) In SadTalker/src/generate_facerender_batch.py
On line: 10
Replace:
expression_scale=1.0, still_mode = False, preprocess='crop', size = 256):
with the following:
expression_scale=1.0, head_motion_scale=1.0, still_mode = False, preprocess='crop', size = 256):
On line: 44
Replace:
generated_3dmm[:, :64] = generated_3dmm[:, :64] * expression_scale
with the following two lines: