ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability
https://rocmdocs.amd.com/projects/HIP/
MIT License
3.54k stars 518 forks source link

[Issue]: Conversion of tiny-cuda-nn lib into HIP #3527

Open Vishal-S-P opened 2 weeks ago

Vishal-S-P commented 2 weeks ago

Problem Description

I am facing issues related to code conversion from CUDA to HIP using CUDAEXTENSION approach. Please see the steps to reproduce section.

Operating System

OS: NAME="Ubuntu" VERSION="22.04.3 LTS (Jammy Jellyfish)"

CPU

AMD EPYC 7773X 64-Core Processor

GPU

AMD Instinct MI250X

ROCm Version

ROCm 6.0.0

ROCm Component

HIPIFY

Steps to Reproduce

I am trying to convert CUDA code from https://github.com/NVlabs/tiny-cuda-nn into HIP and compiling the pytorch extenstion. Here is the setup.py I am using -

my_setup.txt

Additionally, I converted the header files in https://github.com/NVlabs/tiny-cuda-nn/tree/master/include/tiny-cuda-nn using the shell script below -

!/bin/bash

CUDA_DIR="../../include/tiny-cuda-nn" HIP_DIR="../../include/tiny-cuda-nn" find $CUDA_DIR -type f ( -iname *.h ) -exec sh -c ' for file; do hipfile="$HIP_DIR/${file#$CUDA_DIR/}" mkdir -p "$(dirname "$hipfile")" echo "Converting $file -> $hipfile" hipify-perl "$file" -print-stats -inplace done ' sh {} +

You can reproduce the following error -

/dockerx/Text-to-3D-Models-on-AMD-GPUs/tiny-cuda-nn/include/tiny-cuda-nn/vec.h:303:53: error: invalid input constraint 'l' in asm 303 | asm ("red.relaxed.gpu.global.add.f32 [%0], %1;" :: "l"(addr), "r"(in_int)); | ^ /dockerx/Text-to-3D-Models-on-AMD-GPUs/tiny-cuda-nn/include/tiny-cuda-nn/vec.h:329:61: error: invalid input constraint 'l' in asm 329 | asm ("red.relaxed.gpu.global.add.noftz.f16x2 [%0], %1;" :: "l"(addr), "r"(in_int));

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

b-sumner commented 2 weeks ago

@Vishal-S-P somehow the HIP compiler is seeing that inline PTX at line 329 of vec.h and that certainly won't work. Apparently the guard "#if TCNN_MIN_GPU_ARCH >= 70" is somehow passing. That needs to be fixed.

Vishal-S-P commented 2 weeks ago

I am passing the

definitions = base_definitions + [f"-DTCNN_MIN_GPU_ARCH={compute_capability}"] and hardcoded compatibility to be 70.

Should I not use 70?