PygmalionAI / aphrodite-engine

Large-scale LLM inference engine
https://aphrodite.pygmalion.chat
GNU Affero General Public License v3.0
1.05k stars 118 forks source link

[Installation]: AMD MI60 (gfx906) installation errors with ROCm 6.1 and 6.2 #774

Open Said-Akbar opened 1 day ago

Said-Akbar commented 1 day ago

Your current environment

python env.py
Collecting environment information...
PyTorch version: 2.6.0.dev20241011+rocm6.2
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 6.2.41133-dd7f95766

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.30.4
Libc version: glibc-2.35

Python version: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.8.0-45-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: AMD Radeon Graphics (gfx906:sramecc+:xnack-)
Nvidia driver version: 550.90.07
cuDNN version: Could not collect
HIP runtime version: 6.2.41133
MIOpen runtime version: 3.2.0
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        48 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               32
On-line CPU(s) list:                  0-31
Vendor ID:                            AuthenticAMD
Model name:                           AMD Ryzen 9 5950X 16-Core Processor
CPU family:                           25
Model:                                33
Thread(s) per core:                   2
Core(s) per socket:                   16
Socket(s):                            1
Stepping:                             0
Frequency boost:                      enabled
CPU max MHz:                          5083.3979
CPU min MHz:                          2200.0000
BogoMIPS:                             6800.12
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
L1d cache:                            512 KiB (16 instances)
L1i cache:                            512 KiB (16 instances)
L2 cache:                             8 MiB (16 instances)
L3 cache:                             64 MiB (2 instances)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-31
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pytorch-triton-rocm==3.1.0+cf34004b8a
[pip3] pyzmq==26.2.0
[pip3] torch==2.6.0.dev20241011+rocm6.2
[pip3] torchaudio==2.5.0.dev20241011+rocm6.2
[pip3] torchvision==0.20.0.dev20241011+rocm6.2
[pip3] transformers==4.44.1
[conda] Could not collect
ROCM Version: 6.2.41134-65d174c3e
Neuron SDK Version: N/A
Aphrodite Version: N/A
Aphrodite Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X  0-31    0       N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

How did you install Aphrodite?

python3 -m venv myenv && source myenv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2/
git clone https://github.com/PygmalionAI/aphrodite-engine.git
cd aphrodite-engine
pip install -U -r requirements-rocm.txt
pip install ninja==1.10.2.4 # for compatibility with the installer.
# There is no documentation about this, but I needed to update setup.py line 20 set to 'rocm' instead of cuda:
# APHRODITE_TARGET_DEVICE = os.getenv("APHRODITE_TARGET_DEVICE", "rocm")
python3 setup.py develop
# initially, above command showed error that thrust library was not compatible with rocm and then I found out it was
# using NVIDIA thrust located at /usr/include/thrust. I could not find which env var was responsible for that and removed 
# thrust (nvidia's) folder from /usr/include/thrust and copied AMD's thrust folder from rocm-6.2.2/include/thrust/

I have 2x AMD MI60 and 1xRTX 3060 for video output. I want to install aphrodite-engine to use with those 2x AMD GPUs. I installed rocm and pytorch with all the dependencies. I spent a few hours to find out that I needed to change setup.py line 20 to APHRODITE_TARGET_DEVICE = os.getenv("APHRODITE_TARGET_DEVICE", "rocm"). After that, I struggled with thrust library being incorrect. cmake was using NVIDIA's thrust from my NVIDIA GPUs. Then I figured out where AMD's thrust folder was and replaced Nvidias thrust with AMDs.

At last, the engine was compiling but at the end it failed with multiple warnings and errors. I tried both ROCm 6.1 and 6.2. Both failed with the same error. The error text is around 6k lines, so attaching as txt file here. errors6_2_w_thrust.txt Sharing some warning and error messages below from that text file:

[1/21] Building CXX object CMakeFiles/_core_C.dir/kernels/core/torch_bindings.cpp.o
cc1plus: warning: command-line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++
...
[7/21] Building HIP object CMakeFiles/_C.dir/kernels/hip_utils_kernels.hip.o
/home/saidp/Downloads/amd_llm/aphrodite-engine/build/temp.linux-x86_64-3.10/kernels/hip_utils_kernels.hip:9:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
    9 |     hipGetDevice(&device);
      |     ^~~~~~~~~~~~ ~~~~~~~
/home/saidp/Downloads/amd_llm/aphrodite-engine/build/temp.linux-x86_64-3.10/kernels/hip_utils_kernels.hip:13:3: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   13 |   hipDeviceGetAttribute(&value, static_cast<hipDeviceAttribute_t>(attribute),
      |   ^~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   14 |                          device);
      |                          ~~~~~~
2 warnings generated when compiling for gfx906.
/home/saidp/Downloads/amd_llm/aphrodite-engine/build/temp.linux-x86_64-3.10/kernels/hip_utils_kernels.hip:9:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
    9 |     hipGetDevice(&device);
      |     ^~~~~~~~~~~~ ~~~~~~~
/home/saidp/Downloads/amd_llm/aphrodite-engine/build/temp.linux-x86_64-3.10/kernels/hip_utils_kernels.hip:13:3: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   13 |   hipDeviceGetAttribute(&value, static_cast<hipDeviceAttribute_t>(attribute),
      |   ^~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   14 |                          device);
      |                          ~~~~~~
2 warnings generated when compiling for host.
[8/21] Building HIP object CMakeFiles/_C.dir/kernels/attention/attention_kernels.hip.o
FAILED: CMakeFiles/_C.dir/kernels/attention/attention_kernels.hip.o 
/opt/rocm-6.2.2/lib/llvm/bin/clang++  -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_PROF_API=1 -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -D__HIP_PLATFORM_AMD__ -D__HIP_PLATFORM_AMD__=1 -D__HIP_ROCclr__=1 -I/home/saidp/Downloads/amd_llm/aphrodite-engine/kernels -isystem /usr/include/python3.10 -isystem /home/saidp/Downloads/amd_llm/myenv/lib/python3.10/site-packages/torch/include -isystem /home/saidp/Downloads/amd_llm/myenv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/rocm-6.2.2/include/hiprand -O2 -g -DNDEBUG -std=gnu++20 --offload-arch=gfx906 --offload-arch=gfx906 -fPIC -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DUSE_ROCM -DENABLE_FP8 -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF_OPERATORS__ -fno-gpu-rdc -D_GLIBCXX_USE_CXX11_ABI=0 -DTORCH_HIP_VERSION=602 -Wno-shift-count-negative -Wno-shift-count-overflow -Wno-duplicate-decl-specifier -DCAFFE2_USE_MIOPEN -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP -std=c++17 -DHIP_NEW_TYPE_ENUMS -MD -MT CMakeFiles/_C.dir/kernels/attention/attention_kernels.hip.o -MF CMakeFiles/_C.dir/kernels/attention/attention_kernels.hip.o.d -o CMakeFiles/_C.dir/kernels/attention/attention_kernels.hip.o -x hip -c /home/saidp/Downloads/amd_llm/aphrodite-engine/build/temp.linux-x86_64-3.10/kernels/attention/attention_kernels.hip
/home/saidp/Downloads/amd_llm/aphrodite-engine/build/temp.linux-x86_64-3.10/kernels/attention/attention_kernels.hip:746:7: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
  746 |       LAUNCH_PAGED_ATTENTION_V1(64);
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/saidp/Downloads/amd_llm/aphrodite-engine/build/temp.linux-x86_64-3.10/kernels/attention/attention_kernels.hip:676:3: note: expanded from macro 'LAUNCH_PAGED_ATTENTION_V1'
  676 |   APHRODITE_DevFuncAttribute_SET_MaxDynamicSharedMemorySize(                   \
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  677 |       ((void*)aphrodite::paged_attention_v1_kernel<                            \
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  678 |           T, CACHE_T, HEAD_SIZE, BLOCK_SIZE, NUM_THREADS, KV_DTYPE,            \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  679 |           IS_BLOCK_SPARSE>),                                                   \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  680 |       shared_mem_size);                                                        \
      |       ~~~~~~~~~~~~~~~~
...
In file included from /opt/rocm-6.2.2/lib/llvm/lib/clang/18/include/__clang_hip_runtime_wrapper.h:143:
/opt/rocm-6.2.2/lib/llvm/lib/clang/18/include/__clang_hip_cmath.h:400:20: error: call to '__test' is ambiguous
  400 |   typedef decltype(__test(declval<_Tp>())) type;
      |                    ^~~~~~
...
332 warnings and 1 error generated when compiling for gfx906.
[9/21] Building HIP object CMakeFiles/_C.dir/kernels/moe/align_block_size_kernel.hip.o
[10/21] Building HIP object CMakeFiles/_C.dir/kernels/quantization/squeezellm/quant_hip_kernel.hip.o
[11/21] Building HIP object CMakeFiles/_C.dir/kernels/quantization/compressed_tensors/int8_quant_kernels.hip.o
[12/21] Building HIP object CMakeFiles/_C.dir/kernels/prepare_inputs/advance_step.hip.o
...
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/saidp/Downloads/amd_llm/aphrodite-engine/setup.py", line 461, in <module>
    setup(
  File "/home/saidp/Downloads/amd_llm/myenv/lib/python3.10/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/saidp/Downloads/amd_llm/myenv/lib/python3.10/site-packages/setuptools/command/develop.py", line 34, in run
    self.install_for_development()
  File "/home/saidp/Downloads/amd_llm/myenv/lib/python3.10/site-packages/setuptools/command/develop.py", line 114, in install_for_development
    self.run_command('build_ext')
  File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/saidp/Downloads/amd_llm/myenv/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/home/saidp/Downloads/amd_llm/aphrodite-engine/setup.py", line 223, in build_extensions
    subprocess.check_call(["cmake", *build_args], cwd=self.build_temp)
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=32', '--target=_core_C', '--target=_moe_C', '--target=_C']' returned non-zero exit status 1.

Please, let me know if this is a version mismatch issue or a bug in the engine. Looking forward to a fix.

Thank you!

Said-Akbar commented 1 day ago

it looks like it is failing to compile paged attention from above logs:

FAILED: CMakeFiles/_C.dir/kernels/attention/attention_kernels.hip.o

Naomiusearch commented 6 hours ago

It's actually issue with ROCM. There's a fix though. Also aphrodite doesn't work on amd right now anyway