QuyAnh2005 / vits-japanese

Text to Speech for Japanese
14 stars 5 forks source link

Any pre-trained models? #2

Open csukuangfj opened 1 year ago

csukuangfj commented 1 year ago

Thank you for making the code about VITS for Japanese open.

Could you also release the pre-trained models? I would like to provide a C++ runtime based on onnxruntime for it.

We have already supported all vits models from piper (https://huggingface.co/spaces/k2-fsa/text-to-speech) However, there are no Japanese models from piper. Would be great if you could provide a pre-trained model for Japanese.

QuyAnh2005 commented 1 year ago

Hi @csukuangfj, you can link to https://www.dropbox.com/scl/fi/l79c7eqb1mcz40dcv1p38/G_1128000.pth?rlkey=atb1aceydcp959z6ajrhcp8gt&dl=0 to get vits pretrained model for japanese

csukuangfj commented 1 year ago

Hi @csukuangfj, you can link to https://www.dropbox.com/scl/fi/l79c7eqb1mcz40dcv1p38/G_1128000.pth?rlkey=atb1aceydcp959z6ajrhcp8gt&dl=0 to get vits pretrained model for japanese

@QuyAnh2005

Thank you for your quick response.

Is the given pre-traiend model compatible with the following config? https://github.com/QuyAnh2005/vits-japanese/blob/main/configs/jp_base.json

QuyAnh2005 commented 1 year ago

Yes, it is compatible @csukuangfj

csukuangfj commented 1 year ago

Yes, it is compatible @csukuangfj

Thanks a lot!

csukuangfj commented 12 months ago

This repo is using https://pypi.org/project/unidic-lite/#files, which is 248 MB after installation. The dict size is too large.

Is there a plan to use https://github.com/espeak-ng/espeak-ng

QuyAnh2005 commented 12 months ago

@csukuangfj maybe use, but I think that we need to train model again to get new pretrained weights

csukuangfj commented 12 months ago

@csukuangfj maybe use, but I think that we need to train model again to get new pretrained weights

That would be great!

espeak-ng is used in piper and we have converted all VITS models from piper to sherpa-onnx. The models are available at https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models

If the dict size is 248 MB, I think it is too large for embedded devices and mobile devices.

If you can provide a Japanese vits model using espeak-ng, I can provide a runtime for it that supports android/ios/raspberry Pi, etc.

Screenshot 2023-12-07 at 10 58 04
QuyAnh2005 commented 1 month ago

Hi @csukuangfj New requirements for inference phase only include

torch==2.0.0
scipy==1.10.1
mecab-python3
unidic-lite
pykakasi
librosa==0.8.0
monotonic-align==1.0.0

and unidic-lite takes about 48MB. Can you convert this model into sherpa-onnx?