espnet / espnet

End-to-End Speech Processing Toolkit
https://espnet.github.io/espnet/
Apache License 2.0
8.51k stars 2.19k forks source link

TTS cant use some pretrained models #5170

Closed omtrix closed 1 year ago

omtrix commented 1 year ago

i use text2speech = Text2Speech.from_pretrained() to generate some japanese audio files when i set the model_tag as 'kan-bayashi/jsut_vits_accent_with_pause' ,it works correctly when i set the model_tag as 'kan-bayashi/tsukuyomi_tts_finetune_full_band_jsut_vits_raw_phn_jacon' , the it comes out like below:

raise RepositoryNotFoundError(message, response) from e

huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64633d6d-55c2b90d62769e1d01aafff5)

Repository Not Found for url: https://huggingface.co/api/models/kan-bayashi/tsukuyomi_tts_finetune_full_band_jsut_vits_raw_phn_jacon/revision/main. Please make sure you specified the correct repo_id and repo_type. If you are trying to access a private or gated repo, make sure you are authenticated. Invalid username or password.

how can i slove this problem?

Fhrozen commented 1 year ago

check for the link: espnet/kan-bayashi_tsukuyomi_tts_finetune_full_band_jsut_vits_raw_phn_jaconv_pyopenjtalk_prosody_latest (https://huggingface.co/espnet/kan-bayashi_tsukuyomi_tts_finetune_full_band_jsut_vits_raw_phn_jaconv_pyopenjtalk_prosody_latest)

Fhrozen commented 1 year ago

another one: https://huggingface.co/espnet/kan-bayashi_tsukuyomi_full_band_vits_prosody

omtrix commented 1 year ago

yeah,i change the model_tag as 'kanbayashi/tsukuyomi_tts_finetune_full_band_jsut_vits_raw_phn_jaconv_pyopenjtalk_prosody_latest' then it works. it seems like im supposed to use the content in url as the model_tag? then i tried the first model named "kan-bayashi/jvs_tts_finetune_jvs001_jsut_vits_raw_phn_jaconv_pyopenjt", i cant run sucessfully even i use the url content ,is there something wrong with my understand?

Fhrozen commented 1 year ago

could you tell me where are getting that tag from: kan-bayashi/jvs_tts_finetune_jvs001_jsut_vits_raw_phn_jaconv_pyopenjt. I am not sure if it exists, (do you mean this one: kan-bayashi/jvs_tts_finetune_jvs010_jsut_vits_raw_phn_jaconv_pyopenjtalk_prosody_latest?)

You need to check for the existing models, from zenodo at: https://github.com/espnet/espnet_model_zoo/blob/master/espnet_model_zoo/table.csv (Not recommended, will deprecated)

from huggingface, you need to use the model tag: espnet/kan-bayashi_tsukuyomi_tts_finetune_full_band_jsut_vits_raw_phn_jaconv_pyopenjtalk_prosody_latest without the: https://huggingface.co/

omtrix commented 1 year ago

i get the "kan-bayashi/jvs_tts_finetune_jvs001_jsut_vits_raw_phn_jaconv_pyopenjt" below the 'model card'(which seems like a wrong way) and i just want 2 use another model in https://huggingface.co/models?language=ja&library=espnet&pipeline_tag=text-to-speech&sort=downloads the model url is https://huggingface.co/espnet/kan-bayashi_jvs_tts_finetune_jvs001_jsut_vits_raw_phn_jaconv_pyopenjta-truncated-178804 So i set my model_tag as "kan-bayashi/jvs_tts_finetune_jvs001_jsut_vits_raw_phn_jaconv_pyopenjta-truncated-178804" ,but it didnt work

and when i check the name in the model_zoo , i cant find it,so i know that when i want 2 use one model im supposed 2 check if its in the model_zoo ,right?

Fhrozen commented 1 year ago

No, if you are using models from zenodo, you need to use as model_tag the one supplied at the model card in the repo (espnet_model_zoo) for model from huggingface, you need to use the repository name: <user>/<model_name> as model_tag, in your case it will be: model_tag="espnet/kan-bayashi_jvs_tts_finetune_jvs001_jsut_vits_raw_phn_jaconv_pyopenjta-truncated-178804". Add also espnet, and do not change anything if you are using a model from huggingface.

omtrix commented 1 year ago

yeah,i just run successfully every model with the format "espnet/****" i think i get this problem because i used 2 set model_tag as "kan-bayashi/ljspeech_vits" or "kan-bayashi/jsut_vits_accent_with_pause" , run successfully. while i used other model as this format,something wrong. thanks for your reply

Fhrozen commented 1 year ago

Glad to read that it worked. Just remember, if you are using a huggingface model, add also the user, in some cases it would espnet, in other cases, the use who trained such as: lichenda/wsj0_2mix_skim_noncausal or mio/amadeus.

Also, please close the issue if it is solved.