Closed hcho3 closed 4 months ago
Hi! This is the friendly automated conda-forge-linting service.
I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml
) and found some lint.
Here's what I've got...
For recipe/meta.yaml:
noarch
packages can't have selectors. If the selectors are necessary, please remove noarch: python
.@jakirkham I reverted the lint-related changes. For now, we can ignore the lint error then?
Yes please feel free to. Was planning to look more closely once we sorted out the build issues. Sorry if that was distracting
Do you have thoughts on the error we are seeing here ( https://github.com/conda-forge/xgboost-feedstock/pull/167#issuecomment-2246723367 )?
@jakirkham I added back ${CUDAToolkit_INCLUDE_DIRS}
to the include paths, but with SYSTEM
flag. This way, the CTK path will be searched after other include paths specified.
Additionally, system include directories are searched after normal include directories regardless of the order specified.
https://cmake.org/cmake/help/latest/command/target_include_directories.html
Hi! This is the friendly automated conda-forge-linting service.
I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml
) and found it was in an excellent condition.
@conda-forge-admin , please re-render
Seeing this error on CI:
FAILED: src/CMakeFiles/objxgboost.dir/tree/gpu_hist/evaluate_splits.cu.o
$BUILD_PREFIX/bin/nvcc -forward-unknown-to-host-compiler -ccbin=$BUILD_PREFIX/bin/x86_64-conda-linux-gnu-c++ -DDMLC_CORE_USE_CMAKE -DDMLC_LOG_CUSTOMIZE=1 -DDMLC_USE_CXX11=1 -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DXGBOOST_BUILTIN_PREFETCH_PRESENT=1 -DXGBOOST_MM_PREFETCH_PRESENT=1 -DXGBOOST_USE_CUDA=1 -DXGBOOST_USE_NCCL=1 -D_MWAITXINTRIN_H_INCLUDED -D__USE_XOPEN2K8 -I$SRC_DIR/include -I$SRC_DIR/dmlc-core/include -I$SRC_DIR/rabit/include -I$SRC_DIR/gputreeshap -I$PREFIX/include -I$SRC_DIR/build-target/dmlc-core/include -isystem $BUILD_PREFIX/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_50,code=[sm_50]" "--generate-code=arch=compute_60,code=[sm_60]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_90,code=[sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-extended-lambda --expt-relaxed-constexpr -Xcompiler=-fopenmp -Xfatbin=-compress-all --default-stream per-thread -lineinfo -MD -MT src/CMakeFiles/objxgboost.dir/tree/gpu_hist/evaluate_splits.cu.o -MF src/CMakeFiles/objxgboost.dir/tree/gpu_hist/evaluate_splits.cu.o.d -x cu -c $SRC_DIR/src/tree/gpu_hist/evaluate_splits.cu -o src/CMakeFiles/objxgboost.dir/tree/gpu_hist/evaluate_splits.cu.o
$SRC_DIR/src/tree/gpu_hist/../updater_gpu_common.cuh(57): error: identifier "FLT_MAX" is undefined
Guessing we want to apply this patch: https://github.com/dmlc/xgboost/pull/10574
Am seeing this error in the Windows CUDA 11.8 builds:
Internal error: assertion failed at: "overload.c", line 28946 in deduce_class_template_args
1 catastrophic error detected in the compilation of "D:/bld/xgboost-split_1721846306365/work/src/data/ellpack_page_source.cu".
Compilation aborted.
ellpack_page_source.cu
nvcc error : 'cudafe++' died with status 0xC0000409
Is this related to the patch we are discussing above ( https://github.com/conda-forge/xgboost-feedstock/pull/167#discussion_r1690279847 )? If so, maybe this is a good reason to consider dropping it and restricting CCCL instead
Is this related to the patch we are discussing above ( https://github.com/conda-forge/xgboost-feedstock/pull/167#discussion_r1690279847 )? If so, maybe this is a good reason to consider dropping it and restricting CCCL instead
Not sure, we can first try restricting CCCL and see if the internal error goes away.
The good new is it appears most of the errors we saw before are clearing out! 🥳 Thanks for all the patches to make that happen ❤️
The bad news is we are still seeing the new Windows compilation issue mentioned above: https://github.com/conda-forge/xgboost-feedstock/pull/167#issuecomment-2248731210
@conda-forge-admin , please re-render
The Windows CUDA 12 build has this error on CI:
D:\bld\xgboost-split_1721864376122\_build_env\Library\include\crt/host_config.h(153):
fatal error C1189: #error: -- unsupported Microsoft Visual Studio version!
Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc
flag '-allow-unsupported-compiler' can be used to override this version
check; however, using an unsupported host compiler may cause compilation
failure or incorrect run time execution. Use at your own risk.
This is odd as it says 2022 should be supported and that is what we are using
Will wait and see if this helps Windows CUDA 11.8 before spending more time on this angle
Edit: No luck. Same issue with CUDA 11.8
@conda-forge-admin , please re-render
I think the error comes from a compiler bug from NVCC 11.8. I was able to repro the same error locally with NVCC 11.8 + MSVC 2022.
Let's consider dropping 11.8 for Windows then?
@conda-forge-admin , please re-render
Hi! This is the friendly automated conda-forge-webservice.
I tried to rerender for you, but it looks like there was nothing to do.
This message was generated by GitHub actions workflow run https://github.com/conda-forge/xgboost-feedstock/actions/runs/10086710216.
@conda-forge-admin , please re-render
Let's consider dropping 11.8 for Windows then?
If the latest change doesn't work, then we can turn this into an issue and skip it
@conda-forge-admin , please re-render
Hi! This is the friendly automated conda-forge-webservice.
I tried to rerender for you, but it looks like there was nothing to do.
This message was generated by GitHub actions workflow run https://github.com/conda-forge/xgboost-feedstock/actions/runs/10088820119.
Hi! This is the friendly automated conda-forge-linting service.
I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml
) and found some lint.
Here's what I've got...
For recipe/meta.yaml:
noarch
packages can't have skips with selectors. If the selectors are necessary, please remove noarch: python
.Thanks Hyunsu
Looks good. Please feel free to add the automerge
label when you are ready
The bot will then merge when CI completes. If that doesn't happen for some reason, it will leave a comment letting us know why
Hi! This is the friendly conda-forge automerge bot!
I considered the following status checks when analyzing this PR:
Thus the PR was not passing and not merged.
Thanks Hyunsu! 🙏
Hi @jakirkham @hcho3 is it normal to have this under a CUDA 12.5 environment:
+ libxgboost 2.1.0 cuda120_h9dfd3e9_1 conda-forge/linux-64 93MB
+ xgboost 2.1.0 cuda118_pyh98e67c5_1 conda-forge/noarch 17kB
I'm updating an environment that has XGBoost as a dependency. The above two lines are part of the mamba output, and it seems a bit weird to me to have mismatched CUDA versions.
Second question, since all C++ includes are distributed, maybe it makes sense to distribute the CMake config files as well?
@trivialfis this is a good question
AIUI in terms of CUDA usage, only libxgboost
has CUDA bits. So xgboost
doesn't have CUDA bits (as it is pure Python)
That said, it certainly is confusing to have xgboost
mislabeled in this way. Also users that constrain xgboost
based on CUDA version may also expect this to carry through to libxgboost
So agree we should fix this. Though think the risk is relatively low. Started doing this in PR: https://github.com/conda-forge/xgboost-feedstock/pull/188
Second question, since all C++ includes are distributed, maybe it makes sense to distribute the CMake config files as well?
Also sorry for not answering the CMake question earlier
Thank you for opening a new issue to discuss: https://github.com/conda-forge/xgboost-feedstock/issues/189
Hopefully the answers there are helpful
Checklist
0
(if the version changed)conda-smithy
(Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)Fixes https://github.com/conda-forge/xgboost-feedstock/issues/172