coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.3k stars 4.31k forks source link

[Bug] Docker Image configuration error when running TTS server. #3454

Closed EvarDion closed 9 months ago

EvarDion commented 10 months ago

Describe the bug

VITS is working fine but a number of other multilingual models are failing to run because of a configuration issue.

A partial list of the models that don't work are:

tts_models/multilingual/multi-dataset/xtts_v2 tts_models/multilingual/multi-dataset/bark tts_models/en/multi-dataset/tortoise-v2

To Reproduce

Download and run the docker image on windows 10 following the Tutorial instruction here:

The setting I used was GPU = true.

Expected behavior

Models should run.

Logs

StackTrace:

root@709cd4fb2c7c:~# python3 TTS/server/server.py --use_cuda true --model_name tts_models/multilingual/multi-dataset/xtts_v2
 > tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
Traceback (most recent call last):
  File "/root/TTS/server/server.py", line 104, in <module>
    synthesizer = Synthesizer(
  File "/root/TTS/utils/synthesizer.py", line 93, in __init__
    self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
  File "/root/TTS/utils/synthesizer.py", line 183, in _load_tts
    self.tts_config = load_config(tts_config_path)
  File "/root/TTS/config/__init__.py", line 82, in load_config
    ext = os.path.splitext(config_path)[1]
  File "/usr/lib/python3.10/posixpath.py", line 118, in splitext
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Environment

"CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 3060"
        ],
        "available": true,
        "version": "11.8"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.1+cu118",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "x86_64",
        "python": "3.10.12",
        "version": "#1 SMP Thu Oct 5 21:02:42 UTC 2023"
    }

Additional context

I did a git clone of the latest repo into the docker container and reinstalled all of the dependencies and the error still occurs so I'm guess its still an unresolved issue.

No response

EvarDion commented 10 months ago

I managed to get the config to load properly by adding the following code at line 105 of TTS/server/server.py

#Check the model path for a config file if none is supplied.
if config_path is None:
    print("looking for config in: ", model_path)
    model_config_path = os.path.join(model_path, "config.json")
    print("model_config_path:", model_config_path)
    if os.path.exists(model_config_path):
         config_path = model_config_path

UPDATE: The Web UI does not load the speaker IDs for xtts_v2, bark and tortoise-v2 so I guess this is a feature that is still a work in progress.

TEMPORARY FIX: (How To Call xtts_v2 with a http Get request)

Example Get Request:

  http://[::1]:5002/api/tts?text=Hello%20how%20are%20you%20today.%20I%20am%20a%20robot.%20How%20may%20I%20help%20you%3F&speaker_id=Daisy%20Studious&style_wav=&language_id=en

Command for listing Speaker ids.

    tts --list_speaker_idxs --model_name   tts_models/multilingual/multi-dataset/xtts_v2
relesssar commented 10 months ago

Same problem.

python3 TTS/server/server.py --model_name tts_models/multilingual/multi-dataset/xtts_v2
 > tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
Traceback (most recent call last):
  File "/root/TTS/server/server.py", line 104, in <module>
    synthesizer = Synthesizer(
  File "/root/TTS/utils/synthesizer.py", line 93, in __init__
    self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
  File "/root/TTS/utils/synthesizer.py", line 183, in _load_tts
    self.tts_config = load_config(tts_config_path)
  File "/root/TTS/config/__init__.py", line 82, in load_config
    ext = os.path.splitext(config_path)[1]
  File "/usr/local/lib/python3.10/posixpath.py", line 118, in splitext
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType
EvarDion commented 10 months ago

Same problem.

If you want to run the server you just need to edit the server.py and manually apply the code fix for it in my comment above but the UI does not list the speaker_ids so you will need to construct a http get request on your own if you want to hear all the voices. (see examples above).

djdookie commented 10 months ago

Temporary fix works for me.

Any ideas if voice cloning (--speaker_wav /path/to/sample.wav) also works via tts-server? If yes and we can successfully use this parameter in the GET-request, where should the sample.wav be stored?

EvarDion commented 10 months ago

Any ideas if voice cloning (--speaker_wav /path/to/sample.wav) also works via tts-server? If yes and we can successfully use this parameter in the GET-request, where should the sample.wav be stored?

Sorry have not tried it yet.

stale[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

strevg commented 7 months ago

Same issue python3 TTS/server/server.py --model_name tts_models/multilingual/multi-dataset/xtts_v2

tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded. Traceback (most recent call last): File "/root/TTS/server/server.py", line 104, in synthesizer = Synthesizer( File "/root/TTS/utils/synthesizer.py", line 93, in init self._load_tts(tts_checkpoint, tts_config_path, use_cuda) File "/root/TTS/utils/synthesizer.py", line 183, in _load_tts self.tts_config = load_config(tts_config_path) File "/root/TTS/config/init.py", line 82, in load_config ext = os.path.splitext(config_path)[1] File "/usr/local/lib/python3.10/posixpath.py", line 118, in splitext p = os.fspath(p) TypeError: expected str, bytes or os.PathLike object, not NoneType on MacBook M1 Pro

chiefMarlin commented 6 months ago

Same issue

MP242 commented 5 months ago

hey,

After downloading the model with the command below, rerun the command with model_path and config_path.

python3 TTS/server/server.py --model_name tts_models/multilingual/multi-dataset/xtts_v2
python3 TTS/server/server.py \
    --model_path ~/.local/share/tts/tts_models/multilingual/multi-dataset/xtts_v2 \
    --config_path ~/.local/share/tts/tts_models/multilingual/multi-dataset/xtts_v2/config.json
kopp commented 2 months ago

In the latest docker (ghcr.io/coqui-ai/tts-cpu from 2024-09-01) the paths changed, i.e. now run

tts-server \
  --model_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2 \
  --config_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/config.json
stevenlafl commented 1 month ago

Yeah, needs speaker "json" now

# docker
docker run --name coqui --rm -it -p 5002:5002 --gpus all -v ./tts:/root/.local/share --entrypoint /bin/bash ghcr.io/coqui-ai/tts

tts-server \
  --model_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2 \
  --config_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/config.json \
  --speakers_file_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/speakers_xtts.pth \
  --use_cuda true

or docker compose:

services:
  coqui:
    container_name: coqui
    image: ghcr.io/coqui-ai/tts
    build:
      context: ./TTS
    ports:
      - 5002:5002
    environment:
      - COQUI_TOS_AGREED=1
    #entrypoint: ["python3", "TTS/server/server.py", "--model_name", "tts_models/multilingual/multi-dataset/xtts_v2", "--use_cuda", "true"]
    entrypoint:
      - "/bin/bash"
      - "-c"
      - "tts-server --model_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2 --config_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/config.json --speakers_file_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/speakers_xtts.pth --use_cuda true"
    volumes:
      - ./tts:/root/.local/share
    deploy:
      resources:
        reservations:
          devices:
            - count: all # alternatively, use `count: all` for all GPUs
              capabilities: [gpu]

Then you can

http://[::1]:5002/api/tts?text=Hello%20how%20are%20you%20today.%20I%20am%20a%20robot.%20How%20may%20I%20help%20you%3F&speaker_id=Daisy%20Studious&style_wav=&language_id=en

This URL uses "Daisy Studious" but the list is:

[
  'Claribel Dervla',   'Daisy Studious',     'Gracie Wise',
  'Tammie Ema',        'Alison Dietlinde',   'Ana Florence',
  'Annmarie Nele',     'Asya Anara',         'Brenda Stern',
  'Gitta Nikolina',    'Henriette Usha',     'Sofia Hellen',
  'Tammy Grit',        'Tanja Adelina',      'Vjollca Johnnie',
  'Andrew Chipper',    'Badr Odhiambo',      'Dionisio Schuyler',
  'Royston Min',       'Viktor Eka',         'Abrahan Mack',
  'Adde Michal',       'Baldur Sanjin',      'Craig Gutsy',
  'Damien Black',      'Gilberto Mathias',   'Ilkin Urbano',
  'Kazuhiko Atallah',  'Ludvig Milivoj',     'Suad Qasim',
  'Torcull Diarmuid',  'Viktor Menelaos',    'Zacharie Aimilios',
  'Nova Hogarth',      'Maja Ruoho',         'Uta Obando',
  'Lidiya Szekeres',   'Chandra MacFarland', 'Szofi Granger',
  'Camilla Holmström', 'Lilya Stainthorpe',  'Zofija Kendrick',
  'Narelle Moon',      'Barbora MacLean',    'Alexandra Hisakawa', 
  'Alma María',        'Rosemary Okafor',    'Ige Behringer', 
  'Filip Traverse',    'Damjan Chapman',     'Wulf Carlevaro', 
  'Aaron Dreschner',   'Kumar Dahl',         'Eugenio Mataracı',
  'Ferran Simen',      'Xavier Hayasaka',    'Luis Moray',
  'Marcos Rudaski'
]

Thanks @EvarDion @kopp for parts of this

Note: if you get AttributeError: 'NoneType' object has no attribute 'name_to_id' it's because it doesn't like quotes. Have to use it like it's shown above.