collabora / WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.
https://collabora.github.io/WhisperSpeech/
MIT License
3.8k stars 207 forks source link

error with hq-fase-en model... #121

Closed BBC-Esq closed 6 months ago

BBC-Esq commented 6 months ago

When using the text_to_audio_playback.py script in the Examples folder I get this error...

Failed to load the S2A model:
Traceback (most recent call last):
  File "C:\PATH\Scripts\WhisperSpeech_working\Lib\site-packages\whisperspeech\pipeline.py", line 59, in __init__
    self.s2a = SADelARTransformer.load_model(**args, device=device)  # use obtained compute device
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PATH\Scripts\WhisperSpeech_working\Lib\site-packages\whisperspeech\s2a_delar_mup_wds_mlang.py", line 415, in load_model
    model = cls(**spec['config'], tunables=Tunables(**Tunables.upgrade(spec['tunables'])))
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Tunables.__init__() got an unexpected keyword argument 'force_hidden_to_emb'

C:\PATH\Scripts\WhisperSpeech_working\Lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Exception in thread Thread-1 (process_text_to_audio):---------------------------------| 19.92% [149/748 00:01<00:04]
Traceback (most recent call last):
  File "C:\Users\Airflow\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "C:\Users\Airflow\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "C:\PATH\Scripts\WhisperSpeech_working\text_to_audio_playback.py", line 45, in process_text_to_audio
    audio_tensor = pipe.generate(sentence)  # Generate audio tensor for the sentence
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PATH\Scripts\WhisperSpeech_working\Lib\site-packages\whisperspeech\pipeline.py", line 100, in generate
    return self.vocoder.decode(self.generate_atoks(text, speaker, lang=lang, cps=cps, step_callback=step_callback))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PATH\Scripts\WhisperSpeech_working\Lib\site-packages\whisperspeech\pipeline.py", line 96, in generate_atoks
    atoks = self.s2a.generate(stoks, speaker.unsqueeze(0), step=step_callback)

The strange thing is that when I use the "q4-base-en" and other models I don't get this...I tried it using Pytorch 2.1.2 and 2.2.0...

BBC-Esq commented 6 months ago

Resolved by installing the repository latest code rather than pypi's .8 release...