coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.88k stars 4.39k forks source link

[Bug] Error while trying to use tortoise-v2 #2621

Closed RuntimeRaider closed 1 year ago

RuntimeRaider commented 1 year ago

Describe the bug

Hi,

I'm attempting to use the tortoise-v2 model for inference.

All I've done so far:

git clone https://github.com/coqui-ai/TTS
cd TTS
pip install -e .[all]
tts --text "Text for TTS" --model_name "tts_models/en/multi-dataset/tortoise-v2" --out_path speech.wav

I got this error on the first attempt:

> Downloading model to /home/vscode/.local/share/tts/tts_models--en--multi-dataset--tortoise-v2
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.72G/1.72G [00:37<00:00, 45.7MiB/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 976M/976M [00:21<00:00, 45.8MiB/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 151M/151M [00:04<00:00, 32.1MiB/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.17G/1.17G [00:25<00:00, 46.0MiB/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25.2M/25.2M [00:01<00:00, 17.0MiB/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 101M/101M [00:03<00:00, 28.2MiB/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391M/391M [00:09<00:00, 41.7MiB/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.07k/1.07k [00:01<00:00, 868iB/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.40k/4.40k [00:00<00:00, 700kiB/s]
 > Model's license - apache 2.0
 > Check https://choosealicense.com/licenses/apache-2.0/ for more info.
 > Using model: tortoise
Downloading (…)lve/main/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 2.11k/2.11k [00:00<00:00, 20.7MB/s]
Downloading pytorch_model.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1.26G/1.26G [00:27<00:00, 46.2MB/s]
Traceback (most recent call last):
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
    response.raise_for_status()
  File "/home/vscode/.local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://huggingface.co/facebook/wav2vec2-large-960h/resolve/main/preprocessor_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/transformers/utils/hub.py", line 417, in cached_file
    resolved_file = hf_hub_download(
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1195, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1541, in get_hf_file_metadata
    hf_raise_for_status(r)
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 301, in hf_raise_for_status
    raise HfHubHTTPError(str(e), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: 502 Server Error: Bad Gateway for url: https://huggingface.co/facebook/wav2vec2-large-960h/resolve/main/preprocessor_config.json

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/python/current/bin/tts", line 8, in <module>
    sys.exit(main())
  File "/workspaces/ai/TTS/TTS/bin/synthesize.py", line 385, in main
    synthesizer = Synthesizer(
  File "/workspaces/ai/TTS/TTS/utils/synthesizer.py", line 101, in __init__
    self._load_tts_from_dir(model_dir, use_cuda)
  File "/workspaces/ai/TTS/TTS/utils/synthesizer.py", line 144, in _load_tts_from_dir
    self.tts_model = setup_tts_model(config)
  File "/workspaces/ai/TTS/TTS/tts/models/__init__.py", line 13, in setup_model
    model = MyModel.init_from_config(config=config, samples=samples)
  File "/workspaces/ai/TTS/TTS/tts/models/tortoise.py", line 838, in init_from_config
    return Tortoise(config)
  File "/workspaces/ai/TTS/TTS/tts/models/tortoise.py", line 336, in __init__
    self.aligner = Wav2VecAlignment()
  File "/workspaces/ai/TTS/TTS/tts/layers/tortoise/wav2vec_alignment.py", line 51, in __init__
    self.feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/wav2vec2-large-960h")
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/transformers/feature_extraction_utils.py", line 330, in from_pretrained
    feature_extractor_dict, kwargs = cls.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/transformers/feature_extraction_utils.py", line 430, in get_feature_extractor_dict
    resolved_feature_extractor_file = cached_file(
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/transformers/utils/hub.py", line 475, in cached_file
    raise EnvironmentError(f"There was a specific connection error when trying to load {path_or_repo_id}:\n{err}")
OSError: There was a specific connection error when trying to load facebook/wav2vec2-large-960h:
502 Server Error: Bad Gateway for url: https://huggingface.co/facebook/wav2vec2-large-960h/resolve/main/preprocessor_config.json

Please can you help me with getting this working?

Thanks

To Reproduce

I ran the command once again to see if it would download the preprocessor_config.json successfully. But still has this error:

vscode ➜ /workspaces/ai/TTS (dev) $ tts --text "Text for TTS" --model_name "tts_models/en/multi-dataset/tortoise-v2" --out_path speech.wav
 > tts_models/en/multi-dataset/tortoise-v2 is already downloaded.
 > Model's license - apache 2.0
 > Check https://choosealicense.com/licenses/apache-2.0/ for more info.
 > Using model: tortoise
Downloading (…)rocessor_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 159/159 [00:00<00:00, 1.65MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████| 85.0/85.0 [00:00<00:00, 859kB/s]
Traceback (most recent call last):
  File "/usr/local/python/current/bin/tts", line 8, in <module>
    sys.exit(main())
  File "/workspaces/ai/TTS/TTS/bin/synthesize.py", line 385, in main
    synthesizer = Synthesizer(
  File "/workspaces/ai/TTS/TTS/utils/synthesizer.py", line 101, in __init__
    self._load_tts_from_dir(model_dir, use_cuda)
  File "/workspaces/ai/TTS/TTS/utils/synthesizer.py", line 144, in _load_tts_from_dir
    self.tts_model = setup_tts_model(config)
  File "/workspaces/ai/TTS/TTS/tts/models/__init__.py", line 13, in setup_model
    model = MyModel.init_from_config(config=config, samples=samples)
  File "/workspaces/ai/TTS/TTS/tts/models/tortoise.py", line 838, in init_from_config
    return Tortoise(config)
  File "/workspaces/ai/TTS/TTS/tts/models/tortoise.py", line 336, in __init__
    self.aligner = Wav2VecAlignment()
  File "/workspaces/ai/TTS/TTS/tts/layers/tortoise/wav2vec_alignment.py", line 52, in __init__
    self.tokenizer = Wav2Vec2CTCTokenizer.from_pretrained("jbetker/tacotron-symbols")
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1812, in from_pretrained
    return cls._from_pretrained(
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1975, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/usr/local/python/3.10.11/lib/python3.10/site-packages/transformers/models/wav2vec2/tokenization_wav2vec2.py", line 189, in __init__
    with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Expected behavior

No response

Logs

No response

Environment

- 🐸TTS version v0.14.0
- No GPU
- Running this inside a Docker Engine Ubuntu 22.04 container.

Additional context

No response

erogol commented 1 year ago

It is caused by a random hiccup on HuggingFace side. But I'll just disable reduction then it should never happen again.

erogol commented 1 year ago

ok I updated the model config. You should remove the tts cache folder and try again.