NVIDIA / spark-rapids-jni

RAPIDS Accelerator JNI For Apache Spark
Apache License 2.0
32 stars 60 forks source link

Improve cuda dependencies check during build #2173

Closed pxLi closed 1 week ago

pxLi commented 2 weeks ago

fix #2164

Check shared objects for libcudf.so right after build&test in nightly, pre-merge, and submodule-syncup, fail the run if found cuda-related libs.

fail test run with revert commit https://github.com/NVIDIA/spark-rapids-jni/pull/2165

[2024-06-26T05:25:43.508Z] + . ci/check-cuda-dependencies.sh target/spark-rapids-jni-24.08.0-SNAPSHOT-cuda11-arm64.jar
[2024-06-26T05:25:43.508Z] ++ set -exo pipefail
[2024-06-26T05:25:43.508Z] ++ jar_path=target/spark-rapids-jni-24.08.0-SNAPSHOT-cuda11-arm64.jar
[2024-06-26T05:25:43.508Z] +++ date +%Y%m%d%H%M%S
[2024-06-26T05:25:43.508Z] ++ tmp_path=/tmp/jni-20240626052541
[2024-06-26T05:25:43.508Z] ++ unzip -j target/spark-rapids-jni-24.08.0-SNAPSHOT-cuda11-arm64.jar '*64/Linux/libcudf.so' -d /tmp/jni-20240626052541
[2024-06-26T05:25:43.508Z] Archive:  target/spark-rapids-jni-24.08.0-SNAPSHOT-cuda11-arm64.jar
[2024-06-26T05:25:48.767Z]   inflating: /tmp/jni-20240626052541/libcudf.so  
[2024-06-26T05:25:48.767Z] ++ objdump -p /tmp/jni-20240626052541/libcudf.so
[2024-06-26T05:25:48.767Z] ++ grep NEEDED
[2024-06-26T05:25:48.767Z] ++ grep -q cuda
[2024-06-26T05:25:48.767Z] ++ echo 'dynamical link to CUDA lib found in libcudf.so...'
[2024-06-26T05:25:48.767Z] dynamical link to CUDA lib found in libcudf.so...
[2024-06-26T05:25:48.767Z] ++ ldd /tmp/jni-20240626052541/libcudf.so
[2024-06-26T05:25:48.767Z]  linux-vdso.so.1 (0x0000ffffbde00000)
[2024-06-26T05:25:48.767Z]  librt.so.1 => /usr/lib64/librt.so.1 (0x0000ffff9618f000)
[2024-06-26T05:25:48.768Z]  libz.so.1 => /usr/lib64/libz.so.1 (0x0000ffff9615e000)
[2024-06-26T05:25:48.768Z]  libnvcomp.so => /home/jenkins/agent/workspace/peixinl-spark-rapids-jni_nightly-dev/target/libcudf-install/lib64/libnvcomp.so (0x0000ffff95331000)
[2024-06-26T05:25:48.768Z]  libnvcomp_gdeflate.so => /home/jenkins/agent/workspace/peixinl-spark-rapids-jni_nightly-dev/target/libcudf-install/lib64/libnvcomp_gdeflate.so (0x0000ffff94373000)
[2024-06-26T05:25:48.768Z]  libnvcomp_bitcomp.so => /home/jenkins/agent/workspace/peixinl-spark-rapids-jni_nightly-dev/target/libcudf-install/lib64/libnvcomp_bitcomp.so (0x0000ffff93074000)
[2024-06-26T05:25:48.768Z]  libcudart.so.11.0 => /usr/local/cuda/lib64/libcudart.so.11.0 (0x0000ffff92fbf000)
[2024-06-26T05:25:48.768Z]  libdl.so.2 => /usr/lib64/libdl.so.2 (0x0000ffff92f9e000)
[2024-06-26T05:25:48.768Z]  libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x0000ffff92f69000)
[2024-06-26T05:25:48.768Z]  libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000ffff92dc5000)
[2024-06-26T05:25:48.768Z]  libm.so.6 => /usr/lib64/libm.so.6 (0x0000ffff92d04000)
[2024-06-26T05:25:48.768Z]  libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x0000ffff92cd3000)
[2024-06-26T05:25:48.768Z]  libc.so.6 => /usr/lib64/libc.so.6 (0x0000ffff92b5d000)
[2024-06-26T05:25:48.768Z]  /lib/ld-linux-aarch64.so.1 (0x0000ffffbddc2000)

pass with latest change if no dynamical link of cuda libs

[2024-06-26T06:34:31.760Z] ++ unzip -j target/spark-rapids-jni-24.08.0-SNAPSHOT-cuda11.jar '*64/Linux/libcudf.so' -d /tmp/jni-20240626063429
[2024-06-26T06:34:31.760Z] Archive:  target/spark-rapids-jni-24.08.0-SNAPSHOT-cuda11.jar
[2024-06-26T06:34:35.927Z]   inflating: /tmp/jni-20240626063429/libcudf.so  
[2024-06-26T06:34:35.927Z] ++ objdump -p /tmp/jni-20240626063429/libcudf.so
[2024-06-26T06:34:35.927Z] ++ grep -q cuda
[2024-06-26T06:34:35.927Z] ++ grep NEEDED
[2024-06-26T06:34:35.927Z] ++ echo 'no dynamical link to CUDA lib found in libcudf.so'
[2024-06-26T06:34:35.927Z] no dynamical link to CUDA lib found in libcudf.so

verified internally with cuda{11,12} and CPU arch_{amd64,aarch64}

pxLi commented 2 weeks ago

build

pxLi commented 2 weeks ago

build

pxLi commented 2 weeks ago

build

pxLi commented 2 weeks ago

updated,

  1. check all packaged *.so files in cpu_arch path
  2. grep out cudart instead cuda
  3. not to hardcode cuda.version

amd64 list,

[2024-06-27T01:36:14.229Z] ++ unzip -j target/spark-rapids-jni-24.08.0-SNAPSHOT-cuda11.jar '*64/Linux/*.so' -d /tmp/jni-20240627013613
[2024-06-27T01:36:14.229Z] Archive:  target/spark-rapids-jni-24.08.0-SNAPSHOT-cuda11.jar
[2024-06-27T01:36:20.806Z]   inflating: /tmp/jni-20240627013613/libcudf.so  
[2024-06-27T01:36:20.806Z]   inflating: /tmp/jni-20240627013613/libcudfjni.so  
[2024-06-27T01:36:20.806Z]   inflating: /tmp/jni-20240627013613/libcufilejni.so  
[2024-06-27T01:36:21.078Z]   inflating: /tmp/jni-20240627013613/libprofilerjni.so  
[2024-06-27T01:36:21.078Z]   inflating: /tmp/jni-20240627013613/libnvcomp.so  
[2024-06-27T01:36:21.348Z]   inflating: /tmp/jni-20240627013613/libnvcomp_gdeflate.so  
[2024-06-27T01:36:21.616Z]   inflating: /tmp/jni-20240627013613/libnvcomp_bitcomp.so  

arm64 list,

[2024-06-27T01:28:17.998Z] ++ unzip -j target/spark-rapids-jni-24.08.0-SNAPSHOT-cuda12-arm64.jar '*64/Linux/*.so' -d /tmp/jni-20240627012817
[2024-06-27T01:28:17.998Z] Archive:  target/spark-rapids-jni-24.08.0-SNAPSHOT-cuda12-arm64.jar
[2024-06-27T01:28:17.998Z]   inflating: /tmp/jni-20240627012817/libcudfjni.so  
[2024-06-27T01:28:24.552Z]   inflating: /tmp/jni-20240627012817/libcudf.so  
[2024-06-27T01:28:24.552Z]   inflating: /tmp/jni-20240627012817/libnvcomp_bitcomp.so  
[2024-06-27T01:28:24.552Z]   inflating: /tmp/jni-20240627012817/libnvcomp_gdeflate.so  
[2024-06-27T01:28:24.809Z]   inflating: /tmp/jni-20240627012817/libnvcomp.so  

and check all so files,

[2024-06-27T01:36:21.616Z] ++ find /tmp/jni-20240627013613 -type f -name '*.so'
[2024-06-27T01:36:21.616Z] ++ read -r so_file
[2024-06-27T01:36:21.616Z] ++ grep -qi cudart
[2024-06-27T01:36:21.616Z] ++ grep NEEDED
[2024-06-27T01:36:21.616Z] ++ objdump -p /tmp/jni-20240627013613/libcudf.so
[2024-06-27T01:36:21.880Z] ++ echo 'No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libcudf.so'
[2024-06-27T01:36:21.880Z] No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libcudf.so
[2024-06-27T01:36:21.880Z] ++ read -r so_file
[2024-06-27T01:36:21.880Z] ++ grep -qi cudart
[2024-06-27T01:36:21.880Z] ++ objdump -p /tmp/jni-20240627013613/libcufilejni.so
[2024-06-27T01:36:21.880Z] ++ grep NEEDED
[2024-06-27T01:36:21.880Z] ++ echo 'No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libcufilejni.so'
[2024-06-27T01:36:21.880Z] No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libcufilejni.so
[2024-06-27T01:36:21.880Z] ++ read -r so_file
[2024-06-27T01:36:21.880Z] ++ grep -qi cudart
[2024-06-27T01:36:21.880Z] ++ grep NEEDED
[2024-06-27T01:36:21.880Z] ++ objdump -p /tmp/jni-20240627013613/libnvcomp_gdeflate.so
[2024-06-27T01:36:21.880Z] ++ echo 'No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libnvcomp_gdeflate.so'
[2024-06-27T01:36:21.880Z] No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libnvcomp_gdeflate.so
[2024-06-27T01:36:21.880Z] ++ read -r so_file
[2024-06-27T01:36:21.880Z] ++ grep -qi cudart
[2024-06-27T01:36:21.880Z] ++ objdump -p /tmp/jni-20240627013613/libnvcomp.so
[2024-06-27T01:36:21.880Z] ++ grep NEEDED
[2024-06-27T01:36:21.880Z] ++ echo 'No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libnvcomp.so'
[2024-06-27T01:36:21.880Z] No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libnvcomp.so
[2024-06-27T01:36:21.880Z] ++ read -r so_file
[2024-06-27T01:36:21.880Z] ++ grep -qi cudart
[2024-06-27T01:36:21.880Z] ++ objdump -p /tmp/jni-20240627013613/libprofilerjni.so
[2024-06-27T01:36:21.880Z] ++ grep NEEDED
[2024-06-27T01:36:21.880Z] ++ echo 'No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libprofilerjni.so'
[2024-06-27T01:36:21.880Z] No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libprofilerjni.so
[2024-06-27T01:36:21.880Z] ++ read -r so_file
[2024-06-27T01:36:21.880Z] ++ grep -qi cudart
[2024-06-27T01:36:21.880Z] ++ objdump -p /tmp/jni-20240627013613/libnvcomp_bitcomp.so
[2024-06-27T01:36:21.880Z] ++ grep NEEDED
[2024-06-27T01:36:21.880Z] ++ echo 'No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libnvcomp_bitcomp.so'
[2024-06-27T01:36:21.880Z] No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libnvcomp_bitcomp.so
[2024-06-27T01:36:21.880Z] ++ read -r so_file
[2024-06-27T01:36:21.880Z] ++ grep -qi cudart
[2024-06-27T01:36:21.880Z] ++ grep NEEDED
[2024-06-27T01:36:21.880Z] ++ objdump -p /tmp/jni-20240627013613/libcudfjni.so
[2024-06-27T01:36:21.880Z] ++ echo 'No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libcudfjni.so'
[2024-06-27T01:36:21.880Z] No dynamic link to CUDA Runtime found in /tmp/jni-20240627013613/libcudfjni.so
[2024-06-27T01:36:21.880Z] ++ read -r so_file