get_amd_offload_arch_flag: warning: hipInfo output didn't list any graphics cards

xd2333 commented 1 month ago

hi, i use 6600 with rocm5.5 with llamafile v0.8.6 but i got "get_amd_offload_arch_flag: warning: hipInfo output didn't list any graphics cards"

i can see my card in consle:

import_cuda_impl: initializing gpu module... get_rocm_bin_path: note: amdclang++.exe not found on $PATH get_rocm_bin_path: note: /C/Program Files/AMD/ROCm/5.5//bin/amdclang++.exe does not exist get_rocm_bin_path: note: clang++.exe not found on $PATH get_rocm_bin_path: note: hipInfo.exe not found on $PATH llamafile_log_command: "/C/Program Files/AMD/ROCm/5.5//bin/hipInfo.exe"

device# 0 Name: AMD Radeon RX 6600 XT pciBusID: 43 pciDeviceID: 0 pciDomainID: 0 multiProcessorCount: 16 maxThreadsPerMultiProcessor: 2048 isMultiGpuBoard: 0 clockRate: 2382 Mhz memoryClockRate: 1000 Mhz memoryBusWidth: 0 totalGlobalMem: 7.98 GB totalConstMem: 2147483647 sharedMemPerBlock: 64.00 KB canMapHostMemory: 1 regsPerBlock: 0 warpSize: 32 l2CacheSize: 4194304 computeMode: 0 maxThreadsPerBlock: 1024 maxThreadsDim.x: 1024 maxThreadsDim.y: 1024 maxThreadsDim.z: 1024 maxGridSize.x: 2147483647 maxGridSize.y: 2147483647 maxGridSize.z: 2147483647 major: 10 minor: 3 concurrentKernels: 1 cooperativeLaunch: 0 cooperativeMultiDeviceLaunch: 0 isIntegrated: 0 maxTexture1D: 16384 maxTexture2D.width: 16384 maxTexture2D.height: 16384 maxTexture3D.width: 2048 maxTexture3D.height: 2048 maxTexture3D.depth: 2048 isLargeBar: 0 asicRevision: 0 maxSharedMemoryPerMultiProcessor: 64.00 KB clockInstructionRate: 1000.00 Mhz arch.hasGlobalInt32Atomics: 1 arch.hasGlobalFloatAtomicExch: 1 arch.hasSharedInt32Atomics: 1 arch.hasSharedFloatAtomicExch: 1 arch.hasFloatAtomicAdd: 1 arch.hasGlobalInt64Atomics: 1 arch.hasSharedInt64Atomics: 1 arch.hasDoubles: 1 arch.hasWarpVote: 1 arch.hasWarpBallot: 1 arch.hasWarpShuffle: 1 arch.hasFunnelShift: 0 arch.hasThreadFenceSystem: 1 arch.hasSyncThreadsExt: 0 arch.hasSurfaceFuncs: 0 arch.has3dGrid: 1 arch.hasDynamicParallelism: 0 gcnArchName: gfx1032 peers: non-peers: device#0 device#1

memInfo.total: 7.98 GB memInfo.free: 7.86 GB (98%)

device# 1 Name: AMD Radeon RX 6600 XT pciBusID: 43 pciDeviceID: 0 pciDomainID: 0 multiProcessorCount: 16 maxThreadsPerMultiProcessor: 2048 isMultiGpuBoard: 0 clockRate: 2382 Mhz memoryClockRate: 1000 Mhz memoryBusWidth: 0 totalGlobalMem: 7.98 GB totalConstMem: 2147483647 sharedMemPerBlock: 64.00 KB canMapHostMemory: 1 regsPerBlock: 0 warpSize: 32 l2CacheSize: 4194304 computeMode: 0 maxThreadsPerBlock: 1024 maxThreadsDim.x: 1024 maxThreadsDim.y: 1024 maxThreadsDim.z: 1024 maxGridSize.x: 2147483647 maxGridSize.y: 2147483647 maxGridSize.z: 2147483647 major: 10 minor: 3 concurrentKernels: 1 cooperativeLaunch: 0 cooperativeMultiDeviceLaunch: 0 isIntegrated: 0 maxTexture1D: 16384 maxTexture2D.width: 16384 maxTexture2D.height: 16384 maxTexture3D.width: 2048 maxTexture3D.height: 2048 maxTexture3D.depth: 2048 isLargeBar: 0 asicRevision: 0 maxSharedMemoryPerMultiProcessor: 64.00 KB clockInstructionRate: 1000.00 Mhz arch.hasGlobalInt32Atomics: 1 arch.hasGlobalFloatAtomicExch: 1 arch.hasSharedInt32Atomics: 1 arch.hasSharedFloatAtomicExch: 1 arch.hasFloatAtomicAdd: 1 arch.hasGlobalInt64Atomics: 1 arch.hasSharedInt64Atomics: 1 arch.hasDoubles: 1 arch.hasWarpVote: 1 arch.hasWarpBallot: 1 arch.hasWarpShuffle: 1 arch.hasFunnelShift: 0 arch.hasThreadFenceSystem: 1 arch.hasSyncThreadsExt: 0 arch.hasSurfaceFuncs: 0 arch.has3dGrid: 1 arch.hasDynamicParallelism: 0 gcnArchName: gfx1032 peers: non-peers: device#0 device#1

memInfo.total: 7.98 GB memInfo.free: 7.86 GB (98%)

get_amd_offload_arch_flag: warning: hipInfo output didn't list any graphics cards extract_cuda_dso: note: prebuilt binary /zip/ggml-rocm.dll not found get_nvcc_path: note: nvcc.exe not found on $PATH get_nvcc_path: note: $CUDA_PATH/bin/nvcc.exe does not exist get_nvcc_path: note: /opt/cuda/bin/nvcc.exe does not exist get_nvcc_path: note: /usr/local/cuda/bin/nvcc.exe does not exist link_cuda_dso: note: dynamically linking /C/Users/c/.llamafile/v/0.8.6/ggml-cuda.dll link_cuda_dso: warning: library not found: failed to load library {"function":"server_params_parse","level":"WARN","line":2424,"msg":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1,"tid":"11820704","timestamp":1716686526} note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading {"build":1500,"commit":"a30b324","function":"server_cli","level":"INFO","line":2856,"msg":"build info","tid":"11820704","timestamp":1716686526}

jeromew commented 1 month ago

Hello, I have the same problem on my win11 box.

I tried to make some modifications in cuda.c where hipInfo.exe is launched in order to capture its output, in order to understand what is happening.

There are 2 parts that can be isolated

the posix_spawn steps that launch hipInfo.exe and capture its output
the parsing steps that analyze the captured output in order to find the gfxXXXX references of AMD graphics cards in the system

After some log dance and tests replacing hipInfo.exe with another executable I created, I observed that

the parsing seems to be correct and not responsible of the issue
the posix_spawn steps and stdout capture work when launching for example { "/C/bin/echo.exe", "gfx1031", 0 }. The gfxXXXX string is correctly parsed
the posix_spawn steps for hipInfo.exe indeed launch hipInfo.exe since we can see its output in the console, but does not capture its output. The code never enters into the while loop

i tried to see if hipInfo.exe sends its output to stderr instead of stdout but this is not the case.

At this stage I don't understand why hipInfo.exe's output is not captured.

I also do not understand why this would be a new issue ; we should probably look at older llamafile version to see if we can find one that indeed captures the output of hipInfo.exe on windows.

Or maybe it was developed and tested on linux only and never worked on windows, an issue that would have been hidden by the prebuilt ggml-rocm.dll that was shipped 0.6 <= version <= 0.8.4

jeromew commented 1 month ago

Hello @rasmith I saw that you worked recently on cuda.c / get_amd_offload_arch_flag to improve the parsing of AMD GPUs.

can I ask you if you were working on linux or windows ? it seems that on windows (at least on recent versions) the posix_spawn steps are not capturing the output of hipInfo.exe so I was wondering if you also observed that. Thank you for your help.

rasmith commented 1 month ago

I was working on Linux, so I didn't see this problem. It's probably best to use CreateProcess on Windows. There's a nice writeup here for how to do it: https://learn.microsoft.com/en-us/windows/win32/procthread/creating-a-child-process-with-redirected-input-and-output?redirectedfrom=MSDN

jeromew commented 1 month ago

thanks for the info.

the msdn snippet could probably do the trick but I am not sure it would fit in the overall's coding patterns in llamafile as APIs that are available in cosmopolitan are probably preferred. The way the initial version of get_amd_offload_arch_flag was coded seems to tell that the author was thinking it should work on windows when building with cosmopolitan.

Before contacting the owner on this question, I'll try to test previous version of llamafile to see if the bug was already happening or not

G4Vi commented 1 month ago

Cosmopolitan v3.3.7-v3.3.9 (since https://github.com/jart/cosmopolitan/commit/cf70a4475651d9864f6a16b88ebc6930659d0898) has bugged ntspawn/posix_spawn/exec on Windows. This was fixed in Cosmopolitan v3.3.10 with https://github.com/jart/cosmopolitan/pull/1190

I wonder if issues capturing from hipInfo.exe are related as llamafile v0.8.6 was built with Cosmopolitan v3.3.8.

jeromew commented 1 month ago

I recompiled llamafile with Cosmopolitan v3.3.10 and it indeed fixes the issue. Thank you @G4Vi for spotting & fixing this issue in https://github.com/jart/cosmopolitan/pull/1190

Mozilla-Ocho / llamafile

get_amd_offload_arch_flag: warning: hipInfo output didn't list any graphics cards #446

memInfo.total: 7.98 GB memInfo.free: 7.86 GB (98%)