[Issue]: no cross-platform compilation

elsampsa commented 2 months ago

Problem Description

HIP is supposed to compile and produce binaries that can run on nvidia hardware and even on the cpu (i.e. its cross-platform).

This feature seems to be badly documented and outright broken.

First of all, nowhere it is explained clearly what actually happens in this process (please give a link to docs if you disagree).

There is some online lore saying that the hipcc command does the hip -> cuda conversion "on-the-fly" (presumably with "hipify") and after that uses nvcc to compile and produce the binary.

So let's see if we can do that:

HIP_PLATFORM=nvidia hipcc hello_world.hip -o hello

We get:

sh: 1: /usr/local/cuda/bin/nvcc: not found

To whom it may concern: in modern ubuntu distros, nvidia stuff is all over the place now, not anymore strictly under /usr/local/cuda (see this). There seems to be no way to inform hipcc about this.

We can do a hack with

sudo ln -s /usr /usr/local/cuda

Now we get:

hipcc hello_world.hip -o hello
nvcc fatal   : Don't know what to do with 'hello_world.hip'

So didn't go as expected. hipcc passes the .hip file as-is to nvcc.

Maybe there's some additional package one should install? Let's try hipcc-nvidia:

sudo apt-get install hipcc-nvidia
...
The following NEW packages will be installed:
  hipcc-nvidia
0 upgraded, 1 newly installed, 0 to remove and 42 not upgraded.
Need to get 219 kB of archives.
After this operation, 575 kB of additional disk space will be used.
Get:1 https://repo.radeon.com/rocm/apt/6.2 jammy/main amd64 hipcc-nvidia amd64 1.1.1.60200-66~22.04 [219 kB]
...
Unpacking hipcc-nvidia (1.1.1.60200-66~22.04) ...
dpkg: error processing archive /var/cache/apt/archives/hipcc-nvidia_1.1.1.60200-66~22.04_amd64.deb (--unpack):
 trying to overwrite '/opt/rocm-6.2.0/bin/hipcc', which is also in package hipcc 1.1.1.60200-66~22.04
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Errors were encountered while processing:
 /var/cache/apt/archives/hipcc-nvidia_1.1.1.60200-66~22.04_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

How about hip-runtime-nvidia?

sudo apt-get install hip-runtime-nvidia
...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
 hip-runtime-nvidia : Depends: cuda (>= 7.5) but it is not installable

But I have cuda 12 installed from the official ubuntu repo (maybe this is related to the first problem I reported here).

Here is an issue that's been open since 13. June: https://github.com/ROCm/HIP/issues/3521 Here is another one since March: https://github.com/ROCm/ROCm/issues/2975

So as of today, the state-of-the-art ROCm/HIP doesn't seem to work on nvidia or on CPU or in any other hardware than AMD's.

A single-file (hello_world.hip) example with the appropriate compilation command and necessary apt install commands would be highly appreciated - if the cross-platform feature even works in the first place.

Couldn't find anything consistent/definitive neither in the docs nor in the faq.

Operating System

Ubuntu 22.04

CPU

nvidia geforce rtx 4070

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.2.0

ROCm Component

HIP

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

cjatin commented 2 months ago

Regarding doc, will try to come up with something better, that explains it better. Short version of what happens is, hip functions are mapped to cuda functions, so something like hipMalloc will call cudaMalloc.

Regarding compiler invocation, can you share the output of: HIPCC_VERBOSE=1 hipcc... <your command>

elsampsa commented 2 months ago

Here is a one-liner to test:

touch /tmp/hello.hip; HIP_PLATFORM=nvidia hipcc -v /tmp/hello.hip -o /tmp/hello

Which gives the error nvcc fatal : Don't know what to do with '/tmp/hello.hip.

Here is a more detailed output:

#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/usr/local/cuda/bin
#$ _THERE_=/usr/local/cuda/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/usr/local/cuda/bin/..
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda/bin/../lib:
#$ PATH=/usr/local/cuda/bin/../nvvm/bin:/usr/local/cuda/bin:/home/sampsa/anaconda3/condabin:/home/sampsa/.nix-profile/bin:/home/sampsa/.local/bin:/home/sampsa/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/sampsa/go/bin:/usr/local/go/bin:/home/sampsa/.local/bin:/usr/local/cuda/bin:/opt/rocm-6.2.0/bin:/home/sampsa/python3_packages/pkg_tools
#$ INCLUDES="-I/usr/local/cuda/bin/../targets/x86_64-linux/include"  
#$ LIBRARIES=  "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
nvcc fatal   : Don't know what to do with '/tmp/hello.hip'
failed to execute:/usr/local/cuda/bin/nvcc  -Wno-deprecated-gpu-targets  -isystem /usr/local/cuda/include -isystem "/opt/rocm-6.2.0/include"  -Wno-deprecated-gpu-targets -lcuda -lcudart -L/usr/local/cuda/lib64  -v /tmp/hello.hip -o "/tmp/hello"

jamesxu2 commented 1 month ago

Hi @elsampsa,

To whom it may concern: in modern ubuntu distros, nvidia stuff is all over the place now, not anymore strictly under /usr/local/cuda (see this). There seems to be no way to inform hipcc about this.

You can set the CUDA_PATH environment variable.

There is some online lore saying that the hipcc command does the hip -> cuda conversion "on-the-fly" (presumably with "hipify") and after that uses nvcc to compile and produce the binary. So didn't go as expected. hipcc passes the .hip file as-is to nvcc.

Adding to what @cjatin said, the point of HIP is that you don't need to transform the source to allow it to compile for AMD or Nvidia devices; HIP functions are internally aliased to the correct platform-specific API calls. You would typically include <hip/hip_runtime.h> with performs this aliasing through conditionally including either the AMD APIs or Nvidia APIs. HIPIFY is for the reverse transformation - changing CUDA code back into HIP code so it can be compiled for either platform.

Which gives the error nvcc fatal : Don't know what to do with '/tmp/hello.hip.

This is because nvcc only supports a limited set of file extensions like .cpp and .cu, and .hip is not one of them. If you just rename your file to use one of those supported file extensions, you should be able to compile successfully. No need to install extra packages.

I will check in internally though to see if we're planning to do something about this. I think the error that nvcc throws when you pass it a HIP file is confusing, though I don't know that renaming your source file before sending it to nvcc is the ideal solution.

jamesxu2 commented 4 weeks ago

@elsampsa An update on this - There is no workaround for compiling a .hip file by making hipcc pass it to nvcc. You will have to manually rename the file to an extension supported by nvcc (which we at AMD have no control over).

One hitch is that if you change HIP source file's extension to .cu, you'll end up with the opposite problem of nvcc recognizing the file extension, but hip clang not being able to recognize it. The workaround for this is to pass -xhip to clang to indicate that the source file is indeed a HIP file.

jamesxu2 commented 1 week ago

Hi @elsampsa, I hope this has answered your question. Feel free to reopen this issue or submit a new one if you have followup queries and we'll be happy to help. I will close this issue for now due to inactivity.

ROCm / HIP