Closed limingchina closed 4 days ago
I think you need CUDA device...
Does it mean I can't run it without Nvidia graphics card?
OK. I set the CUDA_PATH environment variable as "C:\Users\username\Downloads\fish-speech\fishenv\env\lib\site-packages\triton\backends\nvidia". It doesn't show that error in triton. However, later, I saw the error;
2024-07-07 19:42:07.597 | INFO | __main__:clean_infer_cache:146 - C:\Users\china\AppData\Local\Temp\gradio was not found
2024-07-07 19:43:02.397 | INFO | __main__:<module>:451 - Loading Llama model...
Exception in thread Thread-6 (worker):
Traceback (most recent call last):
File "C:\Users\china\Downloads\fish-speech\fishenv\env\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\china\Downloads\fish-speech\fishenv\env\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\china\Downloads\fish-speech\tools\llama\generate.py", line 557, in worker
model, decode_one_token = load_model(
File "C:\Users\china\Downloads\fish-speech\tools\llama\generate.py", line 346, in load_model
model = model.to(device=device, dtype=precision)
File "C:\Users\china\Downloads\fish-speech\fishenv\env\lib\site-packages\torch\nn\modules\module.py", line 1137, in to
device, dtype, non_blocking, convert_to_format = torch._C._nn._parse_to(*args, **kwargs)
RuntimeError: Device string must not be empty
I guess it does require a nvidia graphics card. Can you confirm? If yes, maybe you can use the issue to improve the requirement part of the documentation and mention this explicitly.
Describe the bug Can't open inference server.
To Reproduce
Expected behavior The inference web UI should be shown at http://127.0.0.1:7860
Actual behavior No inference web UI is show. The inference service is not running
Screenshots / log It seems that it complains that the triton can't find the CUDA lib. However, according to nvidia's doc: https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/getting_started/quickstart.html#run-on-cpu-only-system. The triton should be able to run without GPU as well.
The python stacktrace appeared twice in the following log. The first happens when starting the webui. I can see that webpage. But When I go to the "inference" tab on the page, and click "open inference server", it shows the same stacktrace and the inference server webpage is not shown.
Additional context Windows 11. Intel integrated graphics card. Use lastest master code.