NVIDIA / spark-rapids-jni

RAPIDS Accelerator JNI For Apache Spark
Apache License 2.0
40 stars 65 forks source link

[BUG] Fail to patch CCCL in `versions.json` #2582

Open ttnghia opened 1 week ago

ttnghia commented 1 week ago

When compiling spark-rapids-jni, patching CCCL is faling:

[INFO]      [exec] -- rapids-cmake [CCCL]: failed to apply diff cccl_symbol_visibility.diff
[INFO]      [exec] -- rapids-cmake [CCCL]: git diff output: error: can't open patch 'jni/thirdparty/cudf-pins/cccl_symbol_visibility.diff': No such file or directory
[INFO]      [exec] -- 
[INFO]      [exec] -- rapids-cmake [CCCL]: failed to apply diff thrust_disable_64bit_dispatching.diff
[INFO]      [exec] -- rapids-cmake [CCCL]: git diff output: error: can't open patch 'jni/thirdparty/cudf-pins/thrust_disable_64bit_dispatching.diff': No such file or directory
[INFO]      [exec] -- 
[INFO]      [exec] -- rapids-cmake [CCCL]: failed to apply diff thrust_faster_sort_compile_times.diff
[INFO]      [exec] -- rapids-cmake [CCCL]: git diff output: error: can't open patch 'jni/thirdparty/cudf-pins/thrust_faster_sort_compile_times.diff': No such file or directory
[INFO]      [exec] -- 
[INFO]      [exec] -- rapids-cmake [CCCL]: failed to apply diff thrust_faster_scan_compile_times.diff
[INFO]      [exec] -- rapids-cmake [CCCL]: git diff output: error: can't open patch 'jni/thirdparty/cudf-pins/thrust_faster_scan_compile_times.diff': No such file or directory

Without patching, the build still proceeds successfully but there may be potential issues with the built binary due to non-patching CCCL.

pxLi commented 1 week ago

${current_json_dir} is evaluated at runtime and points to the directory holding the pinned versions.json file https://github.com/rapidsai/rapids-cmake/blob/branch-24.12/rapids-cmake/cpm/detail/generate_patch_command.cmake#L61

which actually should target the cudf path in our case https://github.com/rapidsai/cudf/tree/branch-24.12/cpp/cmake/thirdparty/patches

We may need to move the patches from cudf folder into cudf-pins/ or replace the var ${current_json_dir} during submodule-sync

jlowe commented 1 week ago

@robertmaynard this appears to be a bug in the rapids-cmake code that generates versions.json when patch files are referenced for a dependency. It's using ${current_json_dir}/ for the patch file locations, when they are not co-located with versions.json. Seems like ${current_json_dir} should be expanded, per dependency, when emitting this file.