choijeongsoo / lip2speech-unit

[Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units
Other
25 stars 2 forks source link

Issue with Second stage inference: Config File and Checkpoint Size Mismatch #9

Closed kenxxxxx closed 3 weeks ago

kenxxxxx commented 3 months ago

Hello,

Thank you for your excellent work on the lip2speech-unit project. I am currently trying to perform inference using the instructions provided. However, I encountered a problem related to the configuration file:

In your code, specifically the section: if os.path.isdir(a.checkpoint_file): config_file = os.path.join(a.checkpoint_file, 'config.json') else: config_file = os.path.join(os.path.split(a.checkpoint_file)[0], 'config2.json')

It seems the necessary config.json or config2.json files are not provided in the repository. To proceed, I downloaded a configuration file from the HiFi-GAN repository. However, when I attempt to run the inference, I encounter multiple size mismatch errors, particularly for layers like resblocks, conv_post, and dict.weight. Here is an example of the errors: size mismatch for resblocks.10.convs1.1.weight_g: copying a param with shape torch.Size([32, 1, 1]) from checkpoint, the shape in current model is torch.Size([8, 1, 1]). size mismatch for resblocks.10.convs1.1.weight_v: copying a param with shape torch.Size([32, 32, 7]) from checkpoint, the shape in current model is torch.Size([8, 8, 7]). ... Could you provide the correct configuration files or detailed guidance on how to modify the model or the configuration to avoid these size mismatches? Any help or detailed instructions would be greatly appreciated.

Thank you in advance!

choijeongsoo commented 3 months ago

Hello, thank you for your interest in our work!

You can find config.json in checkpoints/config.json.