Open jlowe opened 9 months ago
RAPIDS may drop support for CentOS7 in the upcoming release, and has Ubuntu 20.04 as a minimum required version ( https://docs.rapids.ai/install#system-req ). Does that change what we need to do here?
Or do we still need to ensure the Dockerfile used by the examples is using the same setup as spark-rapids-jni, and we need to update the spark-rapids-jni setup to account for the new minimum OS versions?
Ref: https://endoflife.software/operating-systems/linux/red-hat-enterprise-linux-rhel
Or do we still need to ensure the Dockerfile used by the examples is using the same setup as spark-rapids-jni, and we need to update the spark-rapids-jni setup to account for the new minimum OS versions?
This. Bottom line is the examples need to build in the same environment as spark-rapids-jni does, regardless of what that environment actually is. Note that we still want to build spark-rapids-jni in a way that allows a single binary to run on all supported OS's, and I'm doubtful we can simply build on Ubuntu 20.04's default toolchain to try to satisfy that requirement.
If we plan to update the spark-rapids-jni build setup in 24.04, we can do this issue after changing spark-rapids-jni. Have we already decided to drop centos 7 in 24.04? If so, let's file another issue in spark-rapids-jni to decide which environment to be used for compiling to meet the requirement of single binary to run on alll supported OS.
Have we already decided to drop centos 7 in 24.04? If so, let's file another issue in spark-rapids-jni to decide which environment to be used for compiling to meet the requirement of single binary to run on alll supported OS.
It looks like RAPIDS will deprecate CentOS7 in 24.04 and stop support in 24.06, per https://github.com/rapidsai/docs/pull/475
For 24.04 we should make sure the Dockerfile used for the examples matches the same one used for spark-rapids-jni (Centos7+devtoolset)
In parallel we should figure out what our minimum toolchain will be so we are ready in 24.06.
Hi @YanxuanLiu, is it possible to use the same docker file to build UDF example as the JNI?
Hi @YanxuanLiu, is it possible to use the same docker file to build UDF example as the JNI?
Sry but I think @NvTimLiu could help on this issue. I haven't dealt with this issue.
Hi @NvTimLiu, Can you check if it's possible to use the same docker to build UDF example as the JNI?
Good for CI to use the same docker image as the rapids JNI to build UDF examples
We have a Dockerfile specified for building UDF examples https://github.com/NVIDIA/spark-rapids-examples/blob/main/examples/UDF-Examples/RAPIDS-accelerated-UDFs/Dockerfile
Shall we remove it, and document it that we build UDF examples with the rapids JNI docker image?
Discussed with Gary, we'll use the same docker in CI job and document the link of dockerfile in JNI.
I'll handle it.
The Dockerfile used for the RAPIDS accelerated native UDF example build environment is using Ubuntu18.04, but the build environment used by spark-rapids-jni for the libcudf.so that will be placed in the RAPIDS Accelerator jar is using centos7+devtoolset. That means code could be crossing the GCC CXX11 ABI streams and lead to failures to find symbols at runtime when trying to load the native UDF shared library, e.g.:
which when run through cu++filt shows this is a failure to find:
The Dockerfile used by the examples should be using the same setup as spark-rapids-jni to avoid this. We should also add a RAPIDS Accelerated native UDF that uses a string_scalar with a std::string argument to help catch this ABI mismatch in the future.