edwko / OuteTTS

Interface for OuteTTS models.
Apache License 2.0
374 stars 23 forks source link

Invalid sample rate when trying tts #17

Closed fengwang closed 3 days ago

fengwang commented 6 days ago

Hi, I tried the code below:

from outetts.v0_1.interface import InterfaceHF, InterfaceGGUF

# Initialize the interface with the Hugging Face model
interface = InterfaceHF("OuteAI/OuteTTS-0.1-350M")

# Or initialize the interface with a GGUF model
# interface = InterfaceGGUF("path/to/model.gguf")

# Generate TTS output
# Without a speaker reference, the model generates speech with random speaker characteristics
output = interface.generate(
    text="Hello, am I working?",
    temperature=0.1,
    repetition_penalty=1.1,
    max_length=4096
)

# Play the generated audio
output.play()

# Save the generated audio to a file
output.save("output.wav")

And received unexpected error message like this:

sounddevice.PortAudioError: Error opening OutputStream: Invalid sample rate [PaErrorCode -9997]

The whole output from the console is

2024-11-09 19:49:16.230 | INFO     | outetts.v0_1.audio_codec:ensure_model_exists:37 - Downloading WavTokenizer model from https://huggingface.co/novateur/WavTokenizer-large-speech-75token/resolve/main/wavtokenizer_large_speech_320_24k.ckpt
/data/home_feng/.cache/outeai/tts/wavtokenizer_large_speech_75_token/wavtokenizer_large_speech_320_24k.ckpt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.63G/1.63G [05:31<00:00, 5.29MiB/s]
/data/home_feng/environment/python.3.12/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:134: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)
making attention of type 'vanilla' with 768 in_channels
/data/home_feng/environment/python.3.12/lib/python3.12/site-packages/outetts/v0_1/decoder/pretrained.py:101: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict_raw = torch.load(model_path, map_location="cpu")['state_dict']
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 973k/973k [00:00<00:00, 2.40MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.32M/5.32M [00:00<00:00, 5.69MB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 83.1k/83.1k [00:00<00:00, 1.95MB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 744/744 [00:00<00:00, 5.08MB/s]
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 724M/724M [02:33<00:00, 4.73MB/s]
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 111/111 [00:00<00:00, 1.18MB/s]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2050
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->playback, outParams, self->primeBuffers, hwParamsPlayback, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2724
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2845
Traceback (most recent call last):
  File "/data/home_feng/tmp/opentts/tts.py", line 19, in <module>
    output.play()
  File "/data/home_feng/environment/python.3.12/lib/python3.12/site-packages/outetts/v0_1/interface.py", line 21, in play
    sd.play(self.audio[0].cpu().numpy(), self.sr)
  File "/data/home_feng/environment/python.3.12/lib/python3.12/site-packages/sounddevice.py", line 178, in play
    ctx.start_stream(OutputStream, samplerate, ctx.output_channels,
  File "/data/home_feng/environment/python.3.12/lib/python3.12/site-packages/sounddevice.py", line 2626, in start_stream
    self.stream = StreamClass(samplerate=samplerate,
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home_feng/environment/python.3.12/lib/python3.12/site-packages/sounddevice.py", line 1515, in __init__
    _StreamBase.__init__(self, kind='output', wrap_callback='array',
  File "/data/home_feng/environment/python.3.12/lib/python3.12/site-packages/sounddevice.py", line 909, in __init__
    _check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
  File "/data/home_feng/environment/python.3.12/lib/python3.12/site-packages/sounddevice.py", line 2796, in _check
    raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Error opening OutputStream: Invalid sample rate [PaErrorCode -9997]
edwko commented 5 days ago

The error comes from the sounddevice library and is likely due to your audio hardware or ALSA driver setup not supporting a 24000 sample rate. The simplest workaround is to remove output.play(), as this function only plays the audio if you don’t want to save the audio to a file. You can still use output.save("output.wav") and try to play the saved file. Alternatively, you could resample the audio to a supported sample rate, likely 44100. You may also want to check for related issues in the sounddevice library here: sounddevice GitHub issues.

If you'd like to try resampling and know your supported sample rate, here’s an example:

import torchaudio
new_sr = 44100  # Set your supported sample rate here
resampler = torchaudio.transforms.Resample(orig_freq=output.sr, new_freq=new_sr).to(output.audio.device)
resampled_audio = resampler(output.audio)
output.sr = new_sr
output.audio = resampled_audio

# Now you can play the resampled audio
output.play()
fengwang commented 3 days ago

Many thanks for the kind support. The problem solved after removing the play function.