ROCm / ROCT-Thunk-Interface

ROCm's Thunk Interface
Other
83 stars 71 forks source link

[Issue]: kfdtest build broke in 6.0 but not in 5.x #95

Closed jdgh000 closed 1 month ago

jdgh000 commented 8 months ago

Problem Description

Used to build ok in 5.7 In 6.0, to observe, simply go to tests/kfdtest folder and follow the usual instruction to buiild to see it fails.

Operating System

rh9

CPU

ryzen

GPU

AMD Instinct MI250

ROCm Version

ROCm 6.0.0

ROCm Component

ROCT-Thunk-Interface

Steps to Reproduce

simply go to tests/kfdtest folder and follow the usual instruction to buiild to see it fails.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

jdgh000 commented 8 months ago
CMAKE_PREFIX_PATH=/opt/rocm-6.0.0/lib/llvm/lib/cmake/llvm cmake ..
-- The C compiler identification is GNU 11.4.1
-- The CXX compiler identification is GNU 11.4.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
RESULT_VARIABLE 0 OUTPUT_VARIABLE: .el9
-- Found PkgConfig: /usr/bin/pkg-config (found version "1.7.3")
-- Checking for module 'libdrm'
--   Found libdrm, version 2.4.117
-- Checking for module 'libdrm_amdgpu'
--   Found libdrm_amdgpu, version 2.4.117
-- Checking for module 'libhsakmt'
--   Package 'libhsakmt', required by 'virtual:world', not found
Find libhsakmt at
-- Couldn't find Lightning build in compute directory. Searching LLVM_DIR then defaulting to system LLVM install if still not found...
-- Could NOT find Terminfo (missing: Terminfo_LIBRARIES Terminfo_LINKABLE)
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.11")
-- Found zstd: /usr/lib64/libzstd.so
-- Found LibXml2: /usr/lib64/libxml2.so (found version "2.9.13")
-- Found LLVM 17.0.0git
-- Using LLVMConfig.cmake in: /opt/rocm-6.0.0/lib/llvm/lib/cmake/llvm
-- PROJECT_SOURCE_DIR:/root/extdir/gg/git/codelab-scripts/build-install-scripts/rocm/ROCm-6.0/ROCT-Thunk-Interface/tests/kfdtest
-- Configuring done (1.0s)
CMake Error at /opt/rocm-6.0.0/lib/llvm/lib/cmake/llvm/LLVMExports.cmake:64 (set_target_properties):
  The link interface of target "LLVMSupport" contains:

    Terminfo::terminfo

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  /opt/rocm-6.0.0/lib/llvm/lib/cmake/llvm/LLVMConfig.cmake:246 (include)
  CMakeLists.txt:140 (find_package)

-- Generating done (0.0s)
CMake Generate step failed.  Build files cannot be regenerated correctly.
kentrussell commented 8 months ago

I can't reproduce the issue on ROCM 6.0 in Ubuntu22 or in CentOS, but it keeps picking the installed version instead of the ROCm specific one. Do you have llvm/clang installed on your system as well, or you're just using the one provided by ROCm?

jdgh000 commented 8 months ago

On CentoS, I see following errors, now in 5.6/5.7 and 6.0: I will check on clang.

-- Couldn't find Lightning build in compute directory. Searching LLVM_DIR then defaulting to system LLVM install if still not found...
-- Could NOT find Terminfo (missing: Terminfo_LIBRARIES Terminfo_LINKABLE)
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.11")
-- Found zstd: /usr/lib64/libzstd.so
-- Found LLVM 16.0.0git
-- Using LLVMConfig.cmake in: /opt/rocm-5.6.0/llvm/lib/cmake/llvm
-- PROJECT_SOURCE_DIR:/root/extdir/gg/git/codelab-scripts/build-install-scripts/rocm/ROCm-5.6/ROCT-Thunk-Interface/tests/kfdtest
-- Configuring done (0.9s)
CMake Error at /opt/rocm-5.6.0/llvm/lib/cmake/llvm/LLVMExports.cmake:59 (set_target_properties):
  The link interface of target "LLVMSupport" contains:

    Terminfo::terminfo

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  /opt/rocm-5.6.0/llvm/lib/cmake/llvm/LLVMConfig.cmake:241 (include)
  CMakeLists.txt:140 (find_package)

-- Generating done (0.0s)
CMake Generate step failed.  Build files cannot be regenerated correctly.
jdgh000 commented 4 months ago

llvm/clang is from rocm. For 5.4.3, I see build works by:

CMAKE_PREFIX_PATH=/opt/rocm-5.4.3/llvm/lib/cmake/llvm/ cmake .. && make -j32
....
[ 97%] Building CXX object CMakeFiles/kfdtest.dir/src/RDMATest.cpp.o
[100%] Linking CXX executable kfdtest
[100%] Built target kfdtest
[root@localhost build]# cat /opt/rocm/.info/version
5.4.3-121

However beginning around 6.0.0 release, it appears broken. I managed to get around terminfo error by installing ghc-terminfo-devel package but now seeing whole bunch of linker errors now. I see missing APIs are all defined in ROCT src so apparently something amsis in cmake about inclusion of libhsakmt library during link stage. Again tihs is not an issue with 5.4.3 so something appears broken badly in 6.0.0 and onwards:

[  2%] Linking CXX executable kfdtest
/usr/bin/ld: CMakeFiles/kfdtest.dir/src/BaseDebug.cpp.o: in function `BaseDebug::Attach(kfd_runtime_info*, int, unsigned int, unsigned long)':
/root/extdir/gg/git/codelab-scripts/build-install-scripts/rocm/ROCm-6.0/ROCT-Thunk-Interface/tests/kfdtest/src/BaseDebug.cpp:70: undefined reference to `hsaKmtDebugTrapIoctl'
/usr/bin/ld: CMakeFiles/kfdtest.dir/src/BaseDebug.cpp.o: in function `BaseDebug::Detach()':
/root/extdir/gg/git/codelab-scripts/build-install-scripts/rocm/ROCm-6.0/ROCT-Thunk-Interface/tests/kfdtest/src/BaseDebug.cpp:90: undefined reference to `hsaKmtDebugTrapIoctl'
/usr/bin/ld: CMakeFiles/kfdtest.dir/src/BaseDebug.cpp.o: in function `BaseDebug::SendRuntimeEvent(unsigned long, int, int)':

@kentrussell

alexxu-amd commented 2 months ago

Hi @jdgh000 , Most of the time we leave CMAKE_PREFIX_PATH as /opt/rocm for compiling ROCm components. I was able to reproduce your error using the cmd provided on ROCm 6.1.3 and fixed it by

CMAKE_PREFIX_PATH=/opt/rocm/ cmake .. && make -j32

Give it a try.

In addition, ROCT-Thunk-Interface has been recently integrated into ROCR-Runtime under ROCR-Runtime/libhsakmt. For the latest change, please refer to https://github.com/ROCm/ROCR-Runtime

alexxu-amd commented 1 month ago

@jdgh000 I'm closing the ticket now. Feel free to reopen if you need further assistance!