PygmalionAI / aphrodite-engine

Large-scale LLM inference engine
https://aphrodite.pygmalion.chat
GNU Affero General Public License v3.0
1.03k stars 113 forks source link

Installation fails on NAVI gpu #345

Closed Naomiusearch closed 4 months ago

Naomiusearch commented 6 months ago

Your current environment

Collecting environment information...
PyTorch version: 2.4.0.dev20240317+rocm6.0
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 6.0.32830-d62f6a171
OS: Arch Linux (x86_64)
GCC version: (GCC) 13.2.1 20230801
Clang version: 17.0.6
CMake version: Could not collect 
Libc version: glibc-2.39
Python version: 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801] (64-bit runtime)
Python platform: Linux-6.7.10_1-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: AMD Radeon RX 7900 XTX (gfx1100)
Nvidia driver version: Could not collect 
cuDNN version: Could not collect 
HIP runtime version: 6.0.32830
MIOpen runtime version: 3.0.0
Is XNNPACK available: True
CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        48 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               16
On-line CPU(s) list:                  0-15
Vendor ID:                            AuthenticAMD
Model name:                           AMD Ryzen 7 5800X 8-Core Processor
CPU family:                           25
Model:                                33
Thread(s) per core:                   2
Core(s) per socket:                   8
Socket(s):                            1
Stepping:                             0
Frequency boost:                      enabled
CPU(s) scaling MHz:                   60%
CPU max MHz:                          4850.1948
CPU min MHz:                          2200.0000
BogoMIPS:                             7600.02
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
Virtualization:                       AMD-V
L1d cache:                            256 KiB (8 instances)
L1i cache:                            256 KiB (8 instances)
L2 cache:                             4 MiB (8 instances)
L3 cache:                             32 MiB (1 instance)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-15
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pytorch-triton-rocm==3.0.0+0a22a91d04
[pip3] torch==2.4.0.dev20240317+rocm6.0
[conda] Could not collect ROCM Version: 6.0.32831-204d35d16
Aphrodite Version: N/A
Aphrodite Build Flags:
CUDA Archs: Not Set; ROCm: Disabled

How did you install Aphrodite?

HIP_VISIBLE_DEVICES=1 MAX_JOBS=4 python setup.py install

It errors out

/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:1581:28: error: use of undeclared identifier '__shfl_xor_sync'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:1656:16: error: use of undeclared identifier '__shfl_xor_sync'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:1730:16: error: use of undeclared identifier '__shfl_xor_sync'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:1835:16: error: use of undeclared identifier '__shfl_xor_sync'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:1926:16: error: use of undeclared identifier '__shfl_xor_sync'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:2003:16: error: use of undeclared identifier '__shfl_xor_sync'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:2051:16: error: use of undeclared identifier '__shfl_xor_sync'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:2108:16: error: use of undeclared identifier '__shfl_xor_sync'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:2224:28: error: use of undeclared identifier '__shfl_xor_sync'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:2225:16: error: use of undeclared identifier '__shfl_xor_sync'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:3771:33: error: use of undeclared identifier '__vcmpeq4'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:3772:33: error: use of undeclared identifier '__vcmpeq4'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:3773:28: error: use of undeclared identifier '__vsub4'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:3774:28: error: use of undeclared identifier '__vsub4'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:3782:33: error: use of undeclared identifier '__vcmpeq4'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:3783:33: error: use of undeclared identifier '__vcmpeq4'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:3784:28: error: use of undeclared identifier '__vsub4'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:3785:28: error: use of undeclared identifier '__vsub4'
/home/user/aphrodite-engine/kernels/quantization/gguf/gguf_kernel.hip:3808:28: error: use of undeclared identifier '__vsub4'
/home/user/aphrodite-engine/kernels/quantization/exl2/q_gemm_exl2.hip:120:9: error: no matching function for call to 'hipblasHgemm'
/home/user/aphrodite-engine/kernels/attention/../quantization/int8_kvcache/quant_utils_hip.cuh:210:12: error: no viable conversion from returned value of type 'const float' to function return type '__hip_bfloat16'
/home/user/aphrodite-engine/kernels/attention/attention_kernels.hip:235:23: error: no matching function for call to 'vec_conversion'
Naomiusearch commented 6 months ago

I fixed some of them here, but I don't know if it actually works. It compiled after disabling gguf, squeezellm and gptq kernels, but then it fails while trying to run it.

ImportError: /home/user/aphrodite-engine/venv/lib/python3.11/site-packages/aphrodite_engine-0.5.2+rocm603-py3.11-linux-x86_64.egg/aphrodite/_C.cpython-311-x86_64-linux-gnu.so: undefined symbol: _Z15ggml_dequantizeN2at6TensorEall

AlpinDale commented 6 months ago

Sorry I totally forgot about this issue. I have access to a navi GPU now so I will try and debug this. We will likely need to disable GGUF kernel compilation for non-nvidia GPUs. You can do this by moving the GGUF-related kernels in setup.py to the _is_cuda() block near the end.