ROCm / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
17 stars 14 forks source link

Errors when building on an MI250x server with ROCm 5.7 and PyTorch 2.2.1 #132

Open netw0rkf10w opened 6 months ago

netw0rkf10w commented 6 months ago

Describe the Bug

When installing on an MI250x server with ROCm 5.7 and PyTorch 2.2.1, I obtained the following errors:

 /home/user/code/apex/csrc/fused_dense_hip.hip:63:7: error: use of undeclared identifier 'CUBLAS_COMPUTE_64F'
        CUBLAS_COMPUTE_64F,
        ^
  /home/user/code/apex/csrc/fused_dense_hip.hip:101:7: error: use of undeclared identifier 'CUBLAS_COMPUTE_32F'
        CUBLAS_COMPUTE_32F,
        ^
  /home/user/code/apex/csrc/fused_dense_hip.hip:139:7: error: use of undeclared identifier 'CUBLAS_COMPUTE_16F'
        CUBLAS_COMPUTE_16F,
        ^
  3 errors generated when compiling for gfx90a.

Minimal Steps/Code to Reproduce the Bug

```bash git clone https://github.com/ROCm/apex.git cd apex git checkout release/1.2.0 export HCC_AMDGPU_TARGET=gfx90a export PYTORCH_ROCM_ARCH=gfx90a export GPU_ARCHS=gfx90a export MAX_JOBS=8 pip install -v --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ --user ``` **Expected Behavior**

Environment

Collecting environment information...
PyTorch version: 2.2.1+rocm5.7
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 5.7.31921-d1770ee1b

OS: Red Hat Enterprise Linux release 8.8 (Ootpa) (x86_64)
GCC version: (GCC) 12.2.1 20221121 (Red Hat 12.2.1-7)
Clang version: 15.0.7 (Red Hat 15.0.7-1.module+el8.8.0+17939+b58878af)
CMake version: version 3.28.3
Libc version: glibc-2.28

Python version: 3.10.10 (main, Apr 14 2023, 19:33:04) [GCC 10.3.1 20210422 (Red Hat 10.3.1-1)] (64-bit runtime)
Python platform: Linux-4.18.0-477.10.1.el8_8.x86_64-x86_64-with-glibc2.28
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: AMD Instinct MI250X (gfx90a:sramecc+:xnack-)
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: 5.7.31921
MIOpen runtime version: 2.20.0
Is XNNPACK available: True

CPU:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  2
Core(s) per socket:  64
Socket(s):           1
NUMA node(s):        4
Vendor ID:           AuthenticAMD
CPU family:          25
Model:               48
Model name:          AMD EPYC 7A53 64-Core Processor
Stepping:            1
CPU MHz:             2000.000
CPU max MHz:         3541.0149
CPU min MHz:         1500.0000
BogoMIPS:            3992.70
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            32768K
NUMA node0 CPU(s):   0-15,64-79
NUMA node1 CPU(s):   16-31,80-95
NUMA node2 CPU(s):   32-47,96-111
NUMA node3 CPU(s):   48-63,112-127
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorch-triton-rocm==2.2.0
[pip3] torch==2.2.1+rocm5.7
[pip3] torchaudio==2.2.1+rocm5.7
[pip3] torching==0.0.1
[pip3] torchvision==0.17.1+rocm5.7
[conda] Could not collect
etiennemlb commented 4 months ago
Because they do not work in a develop branch, we end up getting an unfinished version when pulling the master. The officially supported pytorch<->rocm apex version are: APEX Version APEX branch Torch Version
1.3.0 | master | 2.3
1.2.0 | release/1.2.0 | 2.2
1.1.0 | release/1.1.0 | 2.1
1.0.0 | release/1.0.0 | 2.0 and older

So be sure to do some kind of git clone -b release/1.2.0 .. if you are working with torch 2.2 which relies on ROCm 5.7.

etiennemlb commented 4 months ago

Also, a fix landed in d89d4bd - Correct the CUBLAS_COMPUTE type. You can find the hipify supported conversions (cuda->hip) here: https://rocm.docs.amd.com/projects/HIPIFY/en/latest/tables/CUBLAS_API_supported_by_HIP.html

netw0rkf10w commented 4 months ago

@etiennemlb Thanks a lot! As you can see, I already had git checkout release/1.2.0 in my commands, so I was using the right branch. I'm going to try again with https://github.com/ROCm/apex/commit/d89d4bd2bf15abda83113f2d0c846ff01fd90567.