Open CrazyMonkeyCM2 opened 5 months ago
Another issue: if you use another cuda device (eg cuda:1) with deepspeed, the device isn't passed along to deepspeed which will try and use cuda:0
What I did to work around this was:
CUDA_VISIBLE_DEVICES=9
-d="cuda"
Don't set a specific device like -d="cuda:9"
with the env var, or I get an error. Just filter your devices with the CUDA_VISIBLE_DEVICES
env var.
Traceback (most recent call last):
File "D:\Projects\ai\xtts-api-server\.conda\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\Projects\ai\xtts-api-server\.conda\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\Projects\ai\xtts-api-server\xtts_api_server\__main__.py", line 46, in <module>
from xtts_api_server.server import app
File "D:\Projects\ai\xtts-api-server\xtts_api_server\server.py", line 74, in <module>
XTTS.load_model()
File "D:\Projects\ai\xtts-api-server\xtts_api_server\tts_funcs.py", line 187, in load_model
self.load_local_model(load = is_official_model)
File "D:\Projects\ai\xtts-api-server\xtts_api_server\tts_funcs.py", line 210, in load_local_model
self.model.to(self.device)
File "D:\Projects\ai\xtts-api-server\.conda\lib\site-packages\torch\nn\modules\module.py", line 1160, in to
return self._apply(convert)
File "D:\Projects\ai\xtts-api-server\.conda\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply
module._apply(fn)
File "D:\Projects\ai\xtts-api-server\.conda\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply
module._apply(fn)
File "D:\Projects\ai\xtts-api-server\.conda\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply
module._apply(fn)
File "D:\Projects\ai\xtts-api-server\.conda\lib\site-packages\torch\nn\modules\module.py", line 833, in _apply
param_applied = fn(param)
File "D:\Projects\ai\xtts-api-server\.conda\lib\site-packages\torch\nn\modules\module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I have two graphics cards in my system. cuda:0 is for LLM and I am trying to use cuda:1 for xtts. Without "--streaming-mode" or "--streaming-mode-improve" It will use CPU or CUDA:1 just fine, but if either is used it seems to be hard coded for cuda:0 for some reason? There are no errors to post, it works either way, but its causing slow LLM speeds from low vram. Here is how I am launching it: (From pc, Windows, using venv) "call venv\Scripts\activate python -m xtts_api_server --streaming-mode-improve --stream-play-sync --device cuda:1"
Another proof it is being overridden is that this will give an error since I have no cuda:9: "python -m xtts_api_server --device cuda:9", error "RuntimeError: CUDA error: invalid device ordinal"
But this runs with no error and runs on cuda:0 anyway. "python -m xtts_api_server --streaming-mode-improve --stream-play-sync --device cuda:9"
I've read the "About Streaming mode" info link, so I hope I am not missing something. I tried looking at the code, but I am not a python guy, but I found no obvious issue. Thanks for any into or a fix!