OpenTalker / SadTalker

[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
https://sadtalker.github.io/
Other
11.99k stars 2.24k forks source link

Bug in the automatic1111 SadTalker extension (pose style slider) #377

Open sbersier opened 1 year ago

sbersier commented 1 year ago

Object:

SadTalker webui extension for automatic1111: Source: https://github.com/Winfredy/SadTalker Version: a810cbe1 (Tue Jun 6 16:09:05 2023) Installed via automatic1111 extension tab

Description of the bug:

The extension crashes when the pose style slider is set to 46

To reproduce:

1) put an image and an audio file 2) any setting (face model resolution, preprocess mode, still mode and batch size) 3) put the Pose style slider to 46

The problem:

Disclaimer: I'm not a developper, so...

The config/auido2exp.yaml and config/auido2pose.yaml configuration files state that there are 46 classes (for pose style) but the slider goes from 0 to 46 (both values are included) which means a total of 47 classes. This is bigger than the allowed 46 classes stated in the auido2exp.yaml and auido2pose.yaml configuration files.

Ultimately, this seems to be what is causing the crash.

I could trace the error as follows:

0) In app.py: Assume pose_style slider returns: 46

1) gradio_demo.py makes a call (gradio_demo.test on line 82) to audio_to_coeff(in test_audio2coeff.py), with pose_style=46: coeff_path = self.audio_to_coeff.generate(batch, save_dir, pose_style)

2) In test_audio2coeff.py (line 84), it sets the batch['class']variable to: batch['class'] = torch.LongTensor([pose_style]).to(self.device)

Then, on line 85, it makes a call to audio2pose_model.test (in audio2pose.py):

results_dict_pose = self.audio2pose_model.test(batch)

3) In audio2pose_model.test, on line 73, it calls: batch = self.netG.test(batch)

And this is where the error occurs, because netG=CVAE(cfg) (line 20 in audio2pose.py) and the initial config is taken from the yaml config files. Then it calls the net for generation but with an incorrect (pose_style) class in batch and thus doesn't correspond the downloaded model.

Solution:

My first guess is that the maximum value for the pose style slider should simply be put to: 45 (i.e. NUM_CLASSES-1, instead of 46) It could be that it is just a typo in app.py

Correction:

In app.py, on line 104: pose_style = gr.Slider(minimum=0, maximum=45, step=1, label="Pose style", value=0) #

vinthony commented 1 year ago

thanks for your testing. fixed it in new commit.