MoonInTheRiver / DiffSinger

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
MIT License
4.25k stars 712 forks source link

Inference SVS #59

Open ElizavetaSedova opened 2 years ago

ElizavetaSedova commented 2 years ago

Hello! Great job! I would like to know a few things. Interested in SVS (POPCS) 1) Can you tell me about inference? What files are used for inferencing? What's the recipe? How did you manage to repeat the melody (notes) if midi is not used? 2) Can I perform inference for English? What can I do about it? I understand that the accent will remain Chinese. Are you planning further work for other languages?

MoonInTheRiver commented 2 years ago

1, I think you should read this file to get a better understanding: https://github.com/MoonInTheRiver/DiffSinger/blob/master/docs/README-SVS.md 2, The phoneme dictionary of EN is not the same as that of ZH. Thus the answer is no. You should re-train the model using International Phonetic Alphabet (IPA) or re-train the model on EN datasets.

11721206 commented 2 years ago

when inference use phoneme,there is an error "can't convert np.ndarray of type numpy.str_ ........."

michaellin99999 commented 1 year ago

we have done exactly that and get this error when running the SVS inference:

Traceback (most recent call last): File "inference/svs/ds_e2e.py", line 71, in DiffSingerE2EInfer.example_run(c) File "/media/wonder/6AF274CCF2749E4F/Wayne/DiffSinger/inference/svs/base_svs_infer.py", line 240, in example_run infer_ins = cls(hparams) File "/media/wonder/6AF274CCF2749E4F/Wayne/DiffSinger/inference/svs/base_svs_infer.py", line 35, in init self.model = self.build_model() File "inference/svs/ds_e2e.py", line 26, in build_model load_ckpt(model, hparams['work_dir'], 'model') File "/media/wonder/6AF274CCF2749E4F/Wayne/DiffSinger/utils/init.py", line 202, in load_ckpt cur_model.load_state_dict(state_dict, strict=strict) File "/home/wonder/anaconda3/envs/diffsinger_test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for GaussianDiffusion: size mismatch for fs2.encoder_embed_tokens.weight: copying a param with shape torch.Size([72, 256]) from checkpoint, the shape in current model is torch.Size([64, 256]). size mismatch for fs2.encoder.embed_tokens.weight: copying a param with shape torch.Size([72, 256]) from checkpoint, the shape in current model is torch.Size([64, 256]).

[72, 256] is from retraining with our own EN dataset. We don't know where this current model is torch.Size([64, 256]) is coming from. any tips?

manhdoan291 commented 8 months ago

@michaellin99999 Hello, have you done that job yet?