ROCm / ROCm-OpenCL-Runtime

ROCm OpenOpenCL Runtime
168 stars 55 forks source link

Building devprogram.cpp.o fails #64

Open FinnStokes opened 5 years ago

FinnStokes commented 5 years ago

Since 184c0efb3ad33c7326850fd8d790a3822e62a302, I am having issues building the OpenCL runtime on master. With the rock-dkms package installed, and ROCm compile from source, I run

~/bin/repo init -u https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime.git -b master -m opencl.xml
~/bin/repo sync
cd opencl
mkdir build
cd build
cmake3 -DCMAKE_INSTALL_PREFIX=/opt/rocm/opencl -DCMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
make

The compilation runs fine all through LLVM and Clang but fails when it reaches oclruntime, with the error

[ 96%] Building CXX object runtime/CMakeFiles/oclruntime.dir/device/devprogram.cpp.o
In file included from /home/fstokes/OpenCL2/opencl/runtime/device/devprogram.cpp:16:0:
/home/fstokes/OpenCL2/opencl/build/runtime/device/rocm/libraries.amdgcn.inc:2:10: fatal error: oclc_correctly_rounded_sqrt_off.amdgcn.inc: No such file or directory
 #include "oclc_correctly_rounded_sqrt_off.amdgcn.inc"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It seems like there is an issue with the dependencies computed by cmake. runtime/device/devprogram.cpp (which was added in f6629e2b7e691bc648ba928c26fd5cd204f13c53 / 184c0efb3ad33c7326850fd8d790a3822e62a302) imports libraries.amdgcn.inc which in turn imports a number of dynamically generated amdgcn.inc files which are not generated before trying to compile devprogram.cpp.

My current workaround is running

pushd runtime/device/rocm
make
popd
make

after make fails for the first time. This generates the missing include files before continuing. However, I do not know enough about cmake to determine what the actual fix should be to get this dependency ordering correct.

I've attached my cmake and make output in case it is relevant, but I think this is a bug in the cmake configuration that should not be specific to my setup.

jlgreathouse commented 5 years ago

Hi @FinnStokes

The problem you're running into here is that our master branch repo manifest.xml currently pulls from master on many of the sub-projects. This is incorrect, because this means that it will try to pull in changes that have not actually been tested against our OpenCL runtime. The correct thing has been pushed to our roc-2.0.x branch, where we have "pinned" the manifest to the correct commits in other projects.

If you're still interested in building the ROCm OpenCL runtime, you might want to check out our Experimental ROC project and use these the component build scripts for your distro to build it. For example, if you are on Ubuntu 18.04, you could run Experimental_ROC/distro_install_scripts/Ubuntu/Ubuntu_18.04/src_install/component_scripts/01_07_opencl.sh. You can see the arguments to these scripts in the README files. The branches in Experimental ROC correspond to particular ROCm releases.

FinnStokes commented 5 years ago

Hi @jlgreathouse

I tried building the roc-2.0.x branch. Because the 2.0.0 tag does not include the fix to RadeonOpenCompute/ROCm-OpenCL-Driver#76, I had to patch the relevant CMakeLists.txt. When it got to compiling oclruntime, it once again had the same error:

~/bin/repo init -u https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime.git -b roc-2.0.x -m opencl.xml
~/bin/repo sync
cd opencl/
sed -i -e 's/link_directories(${binary_dir}\/googletest)/link_directories(${binary_dir}\/lib)/' compiler/driver/src/unittest/CMakeLists.txt
mkdir build
cd build
scl enable devtoolset-7 bash
cmake3 -DCMAKE_INSTALL_PREFIX=/opt/rocm/opencl -DCMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
make
...
[ 94%] Building CXX object runtime/CMakeFiles/oclruntime.dir/device/devprogram.cpp.o
In file included from /home/fstokes/OpenCL2/opencl/runtime/device/devprogram.cpp:16:0:
/home/fstokes/OpenCL2/opencl/build/runtime/device/rocm/libraries.amdgcn.inc:2:10: fatal error: oclc_correctly_rounded_sqrt_off.amdgcn.inc: No such file or directory
 #include "oclc_correctly_rounded_sqrt_off.amdgcn.inc"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

On the other hand the Experimental ROC script builds fine. It turns out that the only difference here that seems to matter is the use of the -j flag to parallelise the build across multiple cores. With -j 12, the build runs fine for master, the rox-2.0.x branch, or via the Experimental ROC script, whereas building single threaded fails on all three.

ulyssesrr commented 5 years ago

Hi @jlgreathouse I'm mantaining an rocm-opencl-runtime AUR package and some users are reporting this issue(even after the repo pinning). Theses users often claim that building in parallel solves the issue. Since I always built with -j8 I was never hit by it.

After some debugging, it seems that the target that generates the missing header is added as dependency of target oclrocm in runtime/device/rocm/CMakeLists.txt

add_custom_target(${header}_target ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${header})
add_dependencies(oclrocm  ${header}_target)

devprogram.cpp is part of oclruntime target and I couldn't find any direct/indirect dependency between oclruntime and oclrocm that would ensure proper build order, so I added it to runtime/CMakeLists.txt. https://aur.archlinux.org/cgit/aur.git/tree/fix_rocm_opencl_build_order.patch?h=rocm-opencl-runtime&id=ad3d0221a63257abe0cc92f527d152156e6aefc5

Now the build seems to be stable. Could you verify please?

acowley commented 5 years ago

I did something related for NixOS for the same issue.