Streaming modes forces cuda:0

CrazyMonkeyCM2 commented 5 months ago

I have two graphics cards in my system. cuda:0 is for LLM and I am trying to use cuda:1 for xtts. Without "--streaming-mode" or "--streaming-mode-improve" It will use CPU or CUDA:1 just fine, but if either is used it seems to be hard coded for cuda:0 for some reason? There are no errors to post, it works either way, but its causing slow LLM speeds from low vram. Here is how I am launching it: (From pc, Windows, using venv) "call venv\Scripts\activate python -m xtts_api_server --streaming-mode-improve --stream-play-sync --device cuda:1"

Another proof it is being overridden is that this will give an error since I have no cuda:9: "python -m xtts_api_server --device cuda:9", error "RuntimeError: CUDA error: invalid device ordinal"

But this runs with no error and runs on cuda:0 anyway. "python -m xtts_api_server --streaming-mode-improve --stream-play-sync --device cuda:9"

I've read the "About Streaming mode" info link, so I hope I am not missing something. I tried looking at the code, but I am not a python guy, but I found no obvious issue. Thanks for any into or a fix!

Seantourage commented 5 months ago

Another issue: if you use another cuda device (eg cuda:1) with deepspeed, the device isn't passed along to deepspeed which will try and use cuda:0

sabishii1 commented 4 months ago

What I did to work around this was:

Set the environment variable CUDA_VISIBLE_DEVICES=9
Set the device param to -d="cuda"

Don't set a specific device like -d="cuda:9" with the env var, or I get an error. Just filter your devices with the CUDA_VISIBLE_DEVICES env var.

Traceback (most recent call last):
  File "D:\Projects\ai\xtts-api-server\.conda\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\Projects\ai\xtts-api-server\.conda\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\Projects\ai\xtts-api-server\xtts_api_server\__main__.py", line 46, in <module>
    from xtts_api_server.server import app
  File "D:\Projects\ai\xtts-api-server\xtts_api_server\server.py", line 74, in <module>
    XTTS.load_model()
  File "D:\Projects\ai\xtts-api-server\xtts_api_server\tts_funcs.py", line 187, in load_model
    self.load_local_model(load = is_official_model)
  File "D:\Projects\ai\xtts-api-server\xtts_api_server\tts_funcs.py", line 210, in load_local_model
    self.model.to(self.device)
  File "D:\Projects\ai\xtts-api-server\.conda\lib\site-packages\torch\nn\modules\module.py", line 1160, in to
    return self._apply(convert)
  File "D:\Projects\ai\xtts-api-server\.conda\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply
    module._apply(fn)
  File "D:\Projects\ai\xtts-api-server\.conda\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply
    module._apply(fn)
  File "D:\Projects\ai\xtts-api-server\.conda\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply
    module._apply(fn)
  File "D:\Projects\ai\xtts-api-server\.conda\lib\site-packages\torch\nn\modules\module.py", line 833, in _apply
    param_applied = fn(param)
  File "D:\Projects\ai\xtts-api-server\.conda\lib\site-packages\torch\nn\modules\module.py", line 1158, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

daswer123 / xtts-api-server

Streaming modes forces cuda:0 #49