ROCm / rocBLAS

Next generation BLAS implementation for ROCm platform
https://rocm.docs.amd.com/projects/rocBLAS/en/latest/
Other
336 stars 157 forks source link

[Bug]: Failed to build rocblas-5.7.0 with Tensile from source #1363

Closed JiaJiDuan closed 10 months ago

JiaJiDuan commented 10 months ago

Describe the bug

When I built rocblas with Tensile from source. The tag I used is rocm-5.7.0. "TensileCreateLibrary" reported the following error:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker
    r = call_item()
  File "/usr/lib/python3/dist-packages/joblib/externals/loky/process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3/dist-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/joblib/parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/joblib/parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/marco/workspace/rocBLAS/build/release/virtualenv/lib/python3.10/site-packages/Tensile/Parallel.py", line 53, in pcallWithGlobalParamsMultiArg
    return f(*args)
  File "/home/marco/workspace/rocBLAS/build/release/virtualenv/lib/python3.10/site-packages/Tensile/TensileCreateLibrary.py", line 236, in buildSourceCodeObjectFile
    out = subprocess.check_output(compileArgs, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.10/subprocess.py", line 1863, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/home/marco/rocm/bin/hipcc.bat'.

To Reproduce

I built in a native ubuntu docker image.
In the source code, I run ./install.sh -i -a gfx1102

Expected behavior

I wish on linux operating systems it would find 'hipcc' first instead of 'hipcc.bat'

Environment

environment.txt

Additional context

After testing, I found that the problem was in a function called 'which'. It is located in Tensile/TensileCreateLibrary.py, line 151

def which(p):
    exes = [p+x for x in ['.bat', '', '.exe']]  # bat may be front end for file with no extension
    system_path = os.environ['PATH'].split(os.pathsep)
    print(os.path.isfile('/home/marco/rocm/bin/hipcc'))
    if p == 'hipcc' and 'CMAKE_CXX_COMPILER' in os.environ and os.path.isfile(os.environ['CMAKE_CXX_COMPILER']):
        return os.environ['CMAKE_CXX_COMPILER']
    for dirname in system_path+[globalParameters["ROCmBinPath"]]:
        for exe in exes:
            candidate = os.path.join(os.path.expanduser(dirname), exe)
            if os.path.isfile(candidate):
                return candidate
    return None

In my environment, the return value of 'CMAKE_CXX_COMPILER' in os.environ is false. This results in entering the loop and looking for hipcc.bat first.
I've now temporarily adjusted the search order to exes = [p+x for x in ['', '.bat', '.exe']]. It compiles normally.
But I don't know if such a change is appropriate.

amcamd commented 10 months ago

Thank you JiaJiDuan for finding this error. I think the change you provide is appropriate. I will test your change on Linux and Windows.

JiaJiDuan commented 10 months ago

Hi, @amcamd .Thank you for your reply. I am happy to contribute to the project.
I have one more question,openmp is a must when I build clients, is there any other way to build besides aomp?
I had already built hipcc and didn't want to build it from scratch just to support openmp.

amcamd commented 10 months ago

Hi @JiaJiDuan . We use the macro _OPENMP with the aim to guard use of OpenMP and make it optional. Can you provide a log for the failing build when OpenMP is not available.

JiaJiDuan commented 10 months ago

Hi, @amcamd .I compile using the command ./install.sh -a gfx1102 -c -d -j 16 and got the following error:

/home/marco/workspace/rocBLAS/build/deps/blis/include/blis/blis.h:18940:10: fatal error: 'omp.h' file not found
#include <omp.h> // skipped
         ^~~~~~~
1 error generated when compiling for gfx1102.

I have installed libomp-dev through apt apt install libomp-dev. But I compiled llvm-project and hipcc from source,it does not have openmp support for amdgpu.

  1. Whether openmp in clients is for cpus or gpus?
  2. Do I need to compile openmp support for amdgpu and how?
amcamd commented 10 months ago

Hi @JiaJiDuan

Answers to questions

  1. OpenMP in clients is for CPU not GPU.
  2. You do not need OpenMP support for amdgpu.

Background

Workaround 1 Switch to using single thread AOCL BLIS (without OpenMP) by changing the function install_blis() in rocBLAS/install.sh to the following:

install_blis()
{
    #Download prebuilt AMD multithreaded blis
    if [[ ! -e "./blis/lib/libblis.a" ]]; then
      case "${ID}" in
          centos|rhel|sles|opensuse-leap)
              wget -nv -O blis.tar.gz https://github.com/amd/blis/releases/download/2.0/aocl-blis-centos-2.0.tar.gz
              ;;
          ubuntu)
              wget -nv -O blis.tar.gz https://github.com/amd/blis/releases/download/2.0/aocl-blis-ubuntu-2.0.tar.gz
              ;;
          *)
              echo "Unsupported OS for this script"
              wget -nv -O blis.tar.gz https://github.com/amd/blis/releases/download/2.0/aocl-blis-ubuntu-2.0.tar.gz
              ;;
      esac

      tar -xvf blis.tar.gz
      rm -rf blis
      mv amd-blis blis
      rm blis.tar.gz
    fi
}

After you have made this change you will need to delete rocBLAS/build and build again.

Workaround 2 Only try this if Workaround 1 does not work. Workaround 1 is preferred. Create a file omp.h in your path with the following:

#ifndef MY_OMP
#define MY_OMP
typedef int omp_int_t;
inline omp_int_t omp_get_thread_num() { return 0;}
inline omp_int_t omp_get_max_threads() { return 1;}
#endif
JiaJiDuan commented 10 months ago

Hi @amcamd .Thank you for your answer.I tried the two solutions you gave me, and they both worked.

What I don't quite understand though is why AOCL BLIS is used instead of OPENBLAS. If only for comparison with gpu results, openblas looks fine. Using AOCL BLIS instead of OPENBLAS doesn't seem necessary.

Also, if my cpu is not AMD, will aocl-blis still work properly? Can I use flame/blis directly

amcamd commented 10 months ago

Hi @JiaJiDuan . The rocBLAS test suite takes hours to run. AOCL BLIS is used in place of OPENBLAS because it runs faster on our test machines. As far as I know AOCL BLIS will run on both AMD and not-AMD cpus. You can use flame/blis directly.

JiaJiDuan commented 10 months ago

Hi @amcamd .Thank you. I understand now.

I have no more questions for now, I will close this issue.
If there is a new question in the future, I will open a new issue.

Thank you again sincerely for your reply.