This issue(post?) records the obstacles and solutions I encountered during the construction process. Hope the maintainer can modify the script after seeing this to make the build process smoother.
sudo docker run -dit --gpus all \
-v.:/root \
--privileged --network=host --ipc=host \
--name phos nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04
Waiting for user input
I used commands in readme to build:
./build.sh -3 -i
It got stuck during the installation of software-properties-common because the process requires user input to confirm time zone information, but there is no way to provide input.
Solution: Manually install software-properties-common or set TZ and DEBIAN_FRONTEND environment vars.
Missing ~/.cargo/env
After completing the first stage of the installation, the script prompted me to source ~/.bashrc. However, after sourcing it, I found that ~/.cargo/env was missing.
Solution: Install the rust toolchain:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Missing header files
When building the Autogen and Remoting components, the process failed, and the log indicated that some header files were missing (see build_log/build_PhOS-Autogen.log and build_log/build_PhOS-Remoting.log for details):
../../pos/cuda_impl/utils/fatbin.h:26:10: fatal error: libelf.h: No such file or directory
26 | #include <libelf.h>
| ^~~~~~~~~~
cpu-utils.c:9:10: fatal error: openssl/md5.h: No such file or directory
9 | #include <openssl/md5.h>
| ^~~~~~~~~~~~~~~
cpu-client-driver.c:7:10: fatal error: vdpau/vdpau.h: No such file or directory
7 | #include <vdpau/vdpau.h>
| ^~~~~~~~~~~~~~~
After completing the installation, I tried to launched hijack library using LD_PRELOAD, but it failed due to a missing libtirpc.so.3. I could only find /usr/lib/x86_64-linux-gnu/libtirpc.so.
Solution: Run the ldconfig command to generate libtirpc.so.3.
Hijacking failed
I tested the hijack with a hello world CUDA program, but no runtime APIs were hijacked. Running the ldd command to check library dependencies showed that no runtime library was included. It seemed that nvcc forces runtime library to be statically linked in user program binary.
Solution: Add the --cudart=shared argument to force dynamic linking of the CUDA runtime in the user program.
Nice project!
This issue(post?) records the obstacles and solutions I encountered during the construction process. Hope the maintainer can modify the script after seeing this to make the build process smoother.
Environment
docker: 27.1.0 image: nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04 setup command:
Waiting for user input
I used commands in readme to build:
It got stuck during the installation of
software-properties-common
because the process requires user input to confirm time zone information, but there is no way to provide input.Solution: Manually install
software-properties-common
or setTZ
andDEBIAN_FRONTEND
environment vars.Missing ~/.cargo/env
After completing the first stage of the installation, the script prompted me to source
~/.bashrc
. However, after sourcing it, I found that~/.cargo/env
was missing.Solution: Install the rust toolchain:
Missing header files
When building the
Autogen
andRemoting
components, the process failed, and the log indicated that some header files were missing (see build_log/build_PhOS-Autogen.log and build_log/build_PhOS-Remoting.log for details):Solution: Install header files:
Missing dynamic library
After completing the installation, I tried to launched hijack library using
LD_PRELOAD
, but it failed due to a missinglibtirpc.so.3
. I could only find/usr/lib/x86_64-linux-gnu/libtirpc.so
.Solution: Run the
ldconfig
command to generatelibtirpc.so.3
.Hijacking failed
I tested the hijack with a hello world CUDA program, but no runtime APIs were hijacked. Running the
ldd
command to check library dependencies showed that no runtime library was included. It seemed thatnvcc
forces runtime library to be statically linked in user program binary.Solution: Add the
--cudart=shared
argument to force dynamic linking of the CUDA runtime in the user program.