ROCm / rocBLAS

Next generation BLAS implementation for ROCm platform
https://rocm.docs.amd.com/projects/rocBLAS/en/latest/
Other
340 stars 157 forks source link

[Bug]: build is breaking with hip_lite. #1345

Closed jdgh000 closed 1 year ago

jdgh000 commented 1 year ago

Describe the bug

I see --help display following about logic cmdline parameter;

  -l TENSILE_LOGIC, --logic TENSILE_LOGIC
                        Specify the Tensile logic target, e.g., asm_full,
                        asm_lite, etc. (optional, default: asm_full)

when I build,it breaks ( see logs attached)

To Reproduce

Precise version of rocBLAS installed or rocBLAS commit hash if building from source. Steps to reproduce the behavior: ROCm5.5 ./install.sh -ida gfx908 -l hip_lite

Expected behavior

A clear and concise description of what you expected to happen. build successful.

Log-files

partial log (tail part where error occurs) ...


#   /home/jd/ROCm-5.5/rocBLAS/library/src/blas3/Tensile/Logic/hip_lite/hip_Cijk_Alik_Bljk_HHS_BH.yaml
#   /home/jd/ROCm-5.5/rocBLAS/library/src/blas3/Tensile/Logic/hip_lite/hip_Cijk_Alik_Bjlk_HB.yaml
#   /home/jd/ROCm-5.5/rocBLAS/library/src/blas3/Tensile/Logic/hip_lite/hip_Cijk_Alik_Bljk_HSS_BH.yaml
#   /home/jd/ROCm-5.5/rocBLAS/library/src/blas3/Tensile/Logic/hip_lite/hip_Cijk_AlikC_BjlkC_ZB.yaml
#   /home/jd/ROCm-5.5/rocBLAS/library/src/blas3/Tensile/Logic/hip_lite/hip_Cijk_Ailk_Bljk_ZB.yaml
#   /home/jd/ROCm-5.5/rocBLAS/library/src/blas3/Tensile/Logic/hip_lite/hip_Cijk_Ailk_Bljk_BBS_BH.yaml
#   /home/jd/ROCm-5.5/rocBLAS/library/src/blas3/Tensile/Logic/hip_lite/hip_Cijk_Ailk_Bjlk_HSS_BH.yaml
#   /home/jd/ROCm-5.5/rocBLAS/library/src/blas3/Tensile/Logic/hip_lite/hip_Cijk_AlikC_Bljk_ZB.yaml
#   /home/jd/ROCm-5.5/rocBLAS/library/src/blas3/Tensile/Logic/hip_lite/hip_Cijk_AlikC_BjlkC_CB.yaml
#   /home/jd/ROCm-5.5/rocBLAS/library/src/blas3/Tensile/Logic/hip_lite/hip_Cijk_Alik_Bjlk_CB.yaml
#   /home/jd/ROCm-5.5/rocBLAS/library/src/blas3/Tensile/Logic/hip_lite/hip_Cijk_Alik_Bljk_HB.yaml
Reading logic files: Launching 96 threads...
Reading logic files: Done.
[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||] 100% (0.2 secs elapsed)
# Writing Custom CMake
# Writing Kernels...
Generating kernels: Launching 96 threads...
Generating kernels: Done.                                                                                                                                    *
Compiling source kernels: Launching 96 threads...
Compiling source kernels: Done.
# Kernel Building elapsed time = 0.9 secs
Traceback (most recent call last):
  File "/home/jd/ROCm-5.5/rocBLAS/build/release/library/src/../../virtualenv/lib/python3.10/site-packages/Tensile/bin/TensileCreateLibrary", line 43, in <module>
    TensileCreateLibrary()
  File "/home/jd/ROCm-5.5/rocBLAS/build/release/virtualenv/lib/python3.10/site-packages/Tensile/TensileCreateLibrary.py", line 1406, in TensileCreateLibrary
    theMasterLibrary = list(masterLibraries.values())[0]
IndexError: list index out of range
make[2]: *** [library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.dir/build.make:74: Tensile/library/Kernels.so-000-gfx908.hsaco] Error 1
make[2]: *** Deleting file 'Tensile/library/Kernels.so-000-gfx908.hsaco'
make[1]: *** [CMakeFiles/Makefile2:176: library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.dir/all] Error 2
make: *** [Makefile:156: all] Error 2
{'PRETTY_NAME': 'Ubuntu 22.04.2 LTS', 'NAME': 'Ubuntu', 'VERSION_ID': '22.04', 'VERSION': '22.04.2 LTS (Jammy Jellyfish)', 'VERSION_CODENAME': 'jammy', 'ID': 'ubuntu', 'ID_LIKE': 'debian', 'HOME_URL': 'https://www.ubuntu.com/', 'SUPPORT_URL': 'https://help.ubuntu.com/', 'BUG_REPORT_URL': 'https://bugs.launchpad.net/ubuntu/', 'PRIVACY_POLICY_URL': 'https://www.ubuntu.com/legal/terms-and-policies/privacy-policy', 'UBUNTU_CODENAME': 'jammy', 'NUM_PROC': 96}
Build source path: /home/jd/ROCm-5.5/rocBLAS
cmake -DCMAKE_TOOLCHAIN_FILE=toolchain-linux.cmake -DROCM_DIR:PATH=/opt/rocm -DCPACK_PACKAGING_INSTALL_PREFIX=/opt/rocm -DCMAKE_INSTALL_PREFIX="rocblas-install" -DROCM_PATH=/opt/rocm -DCMAKE_PREFIX_PATH:PATH=/opt/rocm -DCPACK_SET_DESTDIR=OFF -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS="gfx908" -DTensile_CODE_OBJECT_VERSION=default -DTensile_LOGIC=hip_lite -DTensile_TEST_LOCAL_PATH=/home/jd/ROCm-5.5/Tensile/ -DTensile_SEPARATE_ARCHITECTURES=ON -DTensile_LAZY_LIBRARY_LOADING=ON -DTensile_LIBRARY_FORMAT=msgpack -DBUILD_FILE_REORG_BACKWARD_COMPATIBILITY=ON /home/jd/ROCm-5.5/rocBLAS
make -j96 install
Traceback (most recent call last):
  File "/home/jd/ROCm-5.5/rocBLAS/./rmake.py", line 445, in <module>
    main()
  File "/home/jd/ROCm-5.5/rocBLAS/./rmake.py", line 438, in main
    if run_cmd(exe, opts):
  File "/home/jd/ROCm-5.5/rocBLAS/./rmake.py", line 406, in run_cmd
    proc = subprocess.run(program, check=True, stderr=subprocess.STDOUT, shell=True)
  File "/usr/lib/python3.10/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'make -j96 install' returned non-zero exit status 2.
+ check_exit_code 1
+ ((  1 != 0  ))
+ exit 1

Environment

Ubuntu 22.04, MI100. ROCm5.5

git remote -v
rocm-swplat     https://github.com/ROCmSoftwarePlatform/rocBLAS (fetch)
git branch -r
  m/roc-5.5.x -> rocm-5.5.1
rkamd commented 1 year ago

@jdgh000 , Thanks for reporting this issue, We will investigate and provide you with an update.

jdgh000 commented 1 year ago

thank, also I notice that with asm_lite, it builds however, one of the example runtime application segfaulting example_sgemm.cpp when run. what is hip_lite, asm_lite, miopen_lite are used for? Any descriptions?

rkamd commented 1 year ago

@jdgh000 , hip_lite and asm_lite are files containing a minimum set of source kernels and assembly kernels respectively. The non default options are only intended for developer tests and not for general use.

For general users, it is recommended to use the default option (asm_full), this would provide the users with a complete set of optimized kernels. And also might resolve the segmentation fault reported in your previous comment.

If you still would like to build with hip_lite, then my recommendation would be to use the following command:

./install.sh -dc -l hip_lite --no-lazy-library-loading --merge-architectures

As mentioned, in future releases, this option will be either part of developer options or will be dropped altogether.

jdgh000 commented 1 year ago

thanks for explanation, in that case, i can focus on asm_full and close jira, i will see how hip_lite with --no-lazy-library-loading goes.