ggerganov / llama.cpp

LLM inference in C/C++
MIT License
65.86k stars 9.46k forks source link

Bug: ggml_vulkan can only Found 1 Vulkan devices. #9716

Open hpx502766238 opened 3 days ago

hpx502766238 commented 3 days ago

What happened?

I have two Vulkan devices NVIDIA GeForce RTX 3060 Laptop GPU (NVIDIA) and AMD Radeon(TM) Graphics (AMD proprietary driver),but ggml_vulkan can only found one. In general,the cli output is :

ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA GeForce RTX 3060 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
llm_load_tensors: ggml ctx size =    0.19 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/37 layers to GPU
llm_load_tensors:        CPU buffer size =  3442.89 MiB

If I disable the NVIDIA in system device manager,then start llamacpp again,ggml_vulkan can found another device:

ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: AMD Radeon(TM) Graphics (AMD proprietary driver) | uma: 1 | fp16: 1 | warp size: 64
llm_load_tensors: ggml ctx size =    0.19 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/37 layers to GPU
llm_load_tensors:        CPU buffer size =  3442.89 MiB

It seems that only one device can be found by ggml_vulkan at one time.

Name and Version

.\llama-cli.exe --version version: 3865 (00b7317e) built with MSVC 19.29.30154.0 for x64

What operating system are you seeing the problem on?

Windows

Relevant log output

No response

piDack commented 3 days ago

Firstly, this is not a bug. In llama.cpp, it will prioritize checking for the presence of a dedicated graphics card. If none is found, it will then attempt to locate the integrated graphics card. The code for this is at https://github.com/ggerganov/llama.cpp/blob/a39ab216aa624308fda7fa84439c6b61dc98b87a/ggml/src/ggml-vulkan.cpp#L2134. Of course, if you want to force the integrated graphics card to be prioritized, you can try setting the GGML_VK_VISIBLE_DEVICES environment variable. In Windows, this would be set GGML_VK_VISIBLE_DEVICES=0 or 1, depending on your system.

hpx502766238 commented 2 days ago

Firstly, this is not a bug. In llama.cpp, it will prioritize checking for the presence of a dedicated graphics card. If none is found, it will then attempt to locate the integrated graphics card. The code for this is at

https://github.com/ggerganov/llama.cpp/blob/a39ab216aa624308fda7fa84439c6b61dc98b87a/ggml/src/ggml-vulkan.cpp#L2134

. Of course, if you want to force the integrated graphics card to be prioritized, you can try setting the GGML_VK_VISIBLE_DEVICES environment variable. In Windows, this would be set GGML_VK_VISIBLE_DEVICES=0 or 1, depending on your system.

Thank you. Setting the temporary environment variable GGML_VK_VISIBLE_DEVICES does work, but it's not precise enough for my needs. I would like llamacpp to be able to display all available devices and their corresponding device IDs through the command line. This way, I can manually select which GPU to use when invoking llamacpp for inference.

piDack commented 2 days ago

Firstly, this is not a bug. In llama.cpp, it will prioritize checking for the presence of a dedicated graphics card. If none is found, it will then attempt to locate the integrated graphics card. The code for this is at https://github.com/ggerganov/llama.cpp/blob/a39ab216aa624308fda7fa84439c6b61dc98b87a/ggml/src/ggml-vulkan.cpp#L2134

. Of course, if you want to force the integrated graphics card to be prioritized, you can try setting the GGML_VK_VISIBLE_DEVICES environment variable. In Windows, this would be set GGML_VK_VISIBLE_DEVICES=0 or 1, depending on your system.

Thank you. Setting the temporary environment variable GGML_VK_VISIBLE_DEVICES does work, but it's not precise enough for my needs. I would like llamacpp to be able to display all available devices and their corresponding device IDs through the command line. This way, I can manually select which GPU to use when invoking llamacpp for inference.

by option 'main-gpu'?

hpx502766238 commented 1 day ago

Firstly, this is not a bug. In llama.cpp, it will prioritize checking for the presence of a dedicated graphics card. If none is found, it will then attempt to locate the integrated graphics card. The code for this is at https://github.com/ggerganov/llama.cpp/blob/a39ab216aa624308fda7fa84439c6b61dc98b87a/ggml/src/ggml-vulkan.cpp#L2134

. Of course, if you want to force the integrated graphics card to be prioritized, you can try setting the GGML_VK_VISIBLE_DEVICES environment variable. In Windows, this would be set GGML_VK_VISIBLE_DEVICES=0 or 1, depending on your system.

Thank you. Setting the temporary environment variable GGML_VK_VISIBLE_DEVICES does work, but it's not precise enough for my needs. I would like llamacpp to be able to display all available devices and their corresponding device IDs through the command line. This way, I can manually select which GPU to use when invoking llamacpp for inference.

by option 'main-gpu'?

You still haven't quite understood my intention. What I mean is that I would like to use a separate command to display all available Vulkan devices and their corresponding device IDs first. For example:

Vulkan0: NVIDIA GeForce RTX 3060 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
Vulkan1: AMD Radeon(TM) Graphics (AMD proprietary driver) | uma: 1 | fp16: 1 | warp size: 64

Then, I want to be able to execute a command to select the corresponding device ID for inference.

Regarding the -mg (main-gpu) parameter you mentioned, I have tried it, but it does not work. For instance, when I set -mg 1, it gives an error:

ERROR:             vkDestroyFence: Invalid device [VUID-vkDestroyFence-device-parameter]

This suggests that the -mg parameter might only work with devices that are already in the list found by ggml_vulkan, and since currently only Vulkan0 is recognized, it results in an error.

The key point is to display all available Vulkan devices and their corresponding device IDs, and not to let ggml_vulkan filter them out."

piDack commented 1 day ago

Firstly, this is not a bug. In llama.cpp, it will prioritize checking for the presence of a dedicated graphics card. If none is found, it will then attempt to locate the integrated graphics card. The code for this is at https://github.com/ggerganov/llama.cpp/blob/a39ab216aa624308fda7fa84439c6b61dc98b87a/ggml/src/ggml-vulkan.cpp#L2134

. Of course, if you want to force the integrated graphics card to be prioritized, you can try setting the GGML_VK_VISIBLE_DEVICES environment variable. In Windows, this would be set GGML_VK_VISIBLE_DEVICES=0 or 1, depending on your system.

Thank you. Setting the temporary environment variable GGML_VK_VISIBLE_DEVICES does work, but it's not precise enough for my needs. I would like llamacpp to be able to display all available devices and their corresponding device IDs through the command line. This way, I can manually select which GPU to use when invoking llamacpp for inference.

by option 'main-gpu'?

You still haven't quite understood my intention. What I mean is that I would like to use a separate command to display all available Vulkan devices and their corresponding device IDs first. For example:

Vulkan0: NVIDIA GeForce RTX 3060 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
Vulkan1: AMD Radeon(TM) Graphics (AMD proprietary driver) | uma: 1 | fp16: 1 | warp size: 64

Then, I want to be able to execute a command to select the corresponding device ID for inference.

Regarding the -mg (main-gpu) parameter you mentioned, I have tried it, but it does not work. For instance, when I set -mg 1, it gives an error:

ERROR:             vkDestroyFence: Invalid device [VUID-vkDestroyFence-device-parameter]

This suggests that the -mg parameter might only work with devices that are already in the list found by ggml_vulkan, and since currently only Vulkan0 is recognized, it results in an error.

The key point is to display all available Vulkan devices and their corresponding device IDs, and not to let ggml_vulkan filter them out."

I understand what you mean, you only need to modify a small amount of code in ggml_vk_instance_init to achieve the relevant functionality, such as listing all GPUs (removing the related judgment at https://github.com/ggerganov/llama.cpp/blob/a39ab216aa624308fda7fa84439c6b61dc98b87a/ggml/src/ggml-vulkan.cpp#L2134), and then assigning the value of mg to https://github.com/ggerganov/llama.cpp/blob/a39ab216aa624308fda7fa84439c6b61dc98b87a/ggml/src/ggml-vulkan.cpp#L2206. I don't think it's difficult.