coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
32.26k stars 3.88k forks source link

Not working for multiple sentences #3574

Closed ayushi15092002 closed 1 month ago

ayushi15092002 commented 5 months ago

Describe the bug

When I send more than one sentence, it is giving me the error.

To Reproduce

Input Text: a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are usable as lumber or plants above a specified height.

import os from TTS.api import TTS TTS_MODEL_PATH = "tts_models/multilingual/multi-dataset/xtts_v2" tts = TTS(model_name=TTS_MODEL_PATH, gpu=True)

def text_to_speech(text, speaker_wav_file, language, output_path): os.makedirs(os.path.dirname(output_path), exist_ok=True) tts.tts_to_file(text, speaker_wav=speaker_wav_file, language=language, file_path=output_path) return output_path

@app.route('/text-to-speech', methods=['POST']) def text_to_speech_api(): try: text = request.form.get('text') language = request.form.get('language') speaker_wav_file = request.files['speaker_wav'] output_path = '/data/ai-tools/voice_cloning/output.wav' # Update this with the actual path output_file = text_to_speech(text, speaker_wav_file, language, output_path) return send_file(output_file) except Exception as e: return str(e), 500

Expected behavior

No response

Logs

Getting Error: Failed to open the input "Custom Input Context" (Invalid data found when
processing input).
Exception raised from get_input_format_context at
/__w/audio/audio/pytorch/audio/torchaudio/csrc/ffmpeg/stream_reader/stream_reader.cpp:42
(most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57
(0x7f4ea370f617 in
/data/ai-tools/venv/lib64/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int,
std::string const&) + 0x64 (0x7f4ea36ca98d in
/data/ai-tools/venv/lib64/python3.9/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x43944 (0x7f4d9c583944 in
  /data/ai-tools/venv/lib/python3.9/site-packages/torchaudio/lib/libtorchaudio_ffmpeg6.so)
  frame #3: torchaudio::io::StreamReader::StreamReader(AVIOContext*,
  c10::optional<std::string> const&, c10::optional<std::map<std::string,
      std::string, std::less<std::string>, std::allocator<std::pair<std::string
        const, std::string> > > > const&) + 0x43 (0x7f4d9c586283 in
        /data/ai-tools/venv/lib/python3.9/site-packages/torchaudio/lib/libtorchaudio_ffmpeg6.so)
        frame #4:
        torchaudio::io::StreamReaderCustomIO::StreamReaderCustomIO(void*,
        c10::optional<std::string> const&, int, int (*)(void*, unsigned char*,
          int), long (*)(void*, long, int), c10::optional<std::map<std::string,
            std::string, std::less<std::string>,
            std::allocator<std::pair<std::string const, std::string> > > >
              const&) + 0x2f (0x7f4d9c58631f in
              /data/ai-tools/venv/lib/python3.9/site-packages/torchaudio/lib/libtorchaudio_ffmpeg6.so)
              frame #5: <unknown function> + 0x17a89 (0x7f4d9c47aa89 in
                /data/ai-tools/venv/lib64/python3.9/site-packages/torchaudio/lib/_torchaudio_ffmpeg6.so)
                frame #6: <unknown function> + 0x2de35 (0x7f4d9c490e35 in
                  /data/ai-tools/venv/lib64/python3.9/site-packages/torchaudio/lib/_torchaudio_ffmpeg6.so)
                  <omitting python frames>
                    frame #12: <unknown function> + 0xf744 (0x7f4dc5a20744 in
                      /data/ai-tools/venv/lib64/python3.9/site-packages/torchaudio/lib/_torchaudio.so)

Environment

TTS Version: 2
AWS EC2
ffmpeg version N-112128-gfa20f5cd9e Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 11 (GCC)
  configuration: --enable-shared --disable-static --enable-gpl --enable-libfreetype
  libavutil      58. 25.100 / 58. 25.100
  libavcodec     60. 26.100 / 60. 26.100
  libavformat    60. 13.100 / 60. 13.100
  libavdevice    60.  2.101 / 60.  2.101
  libavfilter     9. 11.100 /  9. 11.100
  libswscale      7.  3.100 /  7.  3.100
  libswresample   4. 11.100 /  4. 11.100
  libpostproc    57.  2.100 / 57.  2.100

Additional context

No response

stale[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.