Upgrade to XGBoost 2.1.0

hcho3 commented 5 months ago

Checklist

[x] Used a personal fork of the feedstock to propose changes
[x] Reset the build number to 0 (if the version changed)
[x] Re-rendered with the latest conda-smithy (Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)

Fixes https://github.com/conda-forge/xgboost-feedstock/issues/172

conda-forge-webservices[bot] commented 4 months ago

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

noarch packages can't have selectors. If the selectors are necessary, please remove noarch: python.

hcho3 commented 4 months ago

@jakirkham I reverted the lint-related changes. For now, we can ignore the lint error then?

jakirkham commented 4 months ago

Yes please feel free to. Was planning to look more closely once we sorted out the build issues. Sorry if that was distracting

Do you have thoughts on the error we are seeing here ( https://github.com/conda-forge/xgboost-feedstock/pull/167#issuecomment-2246723367 )?

hcho3 commented 4 months ago

@jakirkham I added back ${CUDAToolkit_INCLUDE_DIRS} to the include paths, but with SYSTEM flag. This way, the CTK path will be searched after other include paths specified.

Additionally, system include directories are searched after normal include directories regardless of the order specified.

https://cmake.org/cmake/help/latest/command/target_include_directories.html

conda-forge-webservices[bot] commented 4 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

jakirkham commented 4 months ago

@conda-forge-admin , please re-render

jakirkham commented 4 months ago

Seeing this error on CI:

FAILED: src/CMakeFiles/objxgboost.dir/tree/gpu_hist/evaluate_splits.cu.o 
$BUILD_PREFIX/bin/nvcc -forward-unknown-to-host-compiler -ccbin=$BUILD_PREFIX/bin/x86_64-conda-linux-gnu-c++ -DDMLC_CORE_USE_CMAKE -DDMLC_LOG_CUSTOMIZE=1 -DDMLC_USE_CXX11=1 -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DXGBOOST_BUILTIN_PREFETCH_PRESENT=1 -DXGBOOST_MM_PREFETCH_PRESENT=1 -DXGBOOST_USE_CUDA=1 -DXGBOOST_USE_NCCL=1 -D_MWAITXINTRIN_H_INCLUDED -D__USE_XOPEN2K8 -I$SRC_DIR/include -I$SRC_DIR/dmlc-core/include -I$SRC_DIR/rabit/include -I$SRC_DIR/gputreeshap -I$PREFIX/include -I$SRC_DIR/build-target/dmlc-core/include -isystem $BUILD_PREFIX/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_50,code=[sm_50]" "--generate-code=arch=compute_60,code=[sm_60]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_90,code=[sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-extended-lambda --expt-relaxed-constexpr -Xcompiler=-fopenmp -Xfatbin=-compress-all --default-stream per-thread -lineinfo -MD -MT src/CMakeFiles/objxgboost.dir/tree/gpu_hist/evaluate_splits.cu.o -MF src/CMakeFiles/objxgboost.dir/tree/gpu_hist/evaluate_splits.cu.o.d -x cu -c $SRC_DIR/src/tree/gpu_hist/evaluate_splits.cu -o src/CMakeFiles/objxgboost.dir/tree/gpu_hist/evaluate_splits.cu.o
$SRC_DIR/src/tree/gpu_hist/../updater_gpu_common.cuh(57): error: identifier "FLT_MAX" is undefined

Guessing we want to apply this patch: https://github.com/dmlc/xgboost/pull/10574

jakirkham commented 4 months ago

Am seeing this error in the Windows CUDA 11.8 builds:

Internal error: assertion failed at: "overload.c", line 28946 in deduce_class_template_args

1 catastrophic error detected in the compilation of "D:/bld/xgboost-split_1721846306365/work/src/data/ellpack_page_source.cu".
Compilation aborted.

ellpack_page_source.cu

nvcc error   : 'cudafe++' died with status 0xC0000409

Is this related to the patch we are discussing above ( https://github.com/conda-forge/xgboost-feedstock/pull/167#discussion_r1690279847 )? If so, maybe this is a good reason to consider dropping it and restricting CCCL instead

hcho3 commented 4 months ago

Is this related to the patch we are discussing above ( https://github.com/conda-forge/xgboost-feedstock/pull/167#discussion_r1690279847 )? If so, maybe this is a good reason to consider dropping it and restricting CCCL instead

Not sure, we can first try restricting CCCL and see if the internal error goes away.

jakirkham commented 4 months ago

The good new is it appears most of the errors we saw before are clearing out! 🥳 Thanks for all the patches to make that happen ❤️

The bad news is we are still seeing the new Windows compilation issue mentioned above: https://github.com/conda-forge/xgboost-feedstock/pull/167#issuecomment-2248731210

jakirkham commented 4 months ago

@conda-forge-admin , please re-render

jakirkham commented 4 months ago

The Windows CUDA 12 build has this error on CI:

  D:\bld\xgboost-split_1721864376122\_build_env\Library\include\crt/host_config.h(153):
  fatal error C1189: #error: -- unsupported Microsoft Visual Studio version!
  Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc
  flag '-allow-unsupported-compiler' can be used to override this version
  check; however, using an unsupported host compiler may cause compilation
  failure or incorrect run time execution.  Use at your own risk.

This is odd as it says 2022 should be supported and that is what we are using

Will wait and see if this helps Windows CUDA 11.8 before spending more time on this angle

Edit: No luck. Same issue with CUDA 11.8

jakirkham commented 4 months ago

@conda-forge-admin , please re-render

hcho3 commented 4 months ago

I think the error comes from a compiler bug from NVCC 11.8. I was able to repro the same error locally with NVCC 11.8 + MSVC 2022.

Let's consider dropping 11.8 for Windows then?

jakirkham commented 4 months ago

@conda-forge-admin , please re-render

github-actions[bot] commented 4 months ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/xgboost-feedstock/actions/runs/10086710216.

jakirkham commented 4 months ago

@conda-forge-admin , please re-render

jakirkham commented 4 months ago

Let's consider dropping 11.8 for Windows then?

If the latest change doesn't work, then we can turn this into an issue and skip it

jakirkham commented 4 months ago

@conda-forge-admin , please re-render

github-actions[bot] commented 4 months ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/xgboost-feedstock/actions/runs/10088820119.

conda-forge-webservices[bot] commented 4 months ago

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

noarch packages can't have skips with selectors. If the selectors are necessary, please remove noarch: python.

jakirkham commented 4 months ago

Thanks Hyunsu

Looks good. Please feel free to add the automerge label when you are ready

The bot will then merge when CI completes. If that doesn't happen for some reason, it will leave a comment letting us know why

github-actions[bot] commented 4 months ago

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

linter: failed
azure: passed

Thus the PR was not passing and not merged.

jakirkham commented 4 months ago

Thanks Hyunsu! 🙏

trivialfis commented 4 months ago

Hi @jakirkham @hcho3 is it normal to have this under a CUDA 12.5 environment:

  + libxgboost                                   2.1.0  cuda120_h9dfd3e9_1                 conda-forge/linux-64       93MB
  + xgboost                                      2.1.0  cuda118_pyh98e67c5_1               conda-forge/noarch         17kB

I'm updating an environment that has XGBoost as a dependency. The above two lines are part of the mamba output, and it seems a bit weird to me to have mismatched CUDA versions.

Second question, since all C++ includes are distributed, maybe it makes sense to distribute the CMake config files as well?

jakirkham commented 4 months ago

@trivialfis this is a good question

AIUI in terms of CUDA usage, only libxgboost has CUDA bits. So xgboost doesn't have CUDA bits (as it is pure Python)

That said, it certainly is confusing to have xgboost mislabeled in this way. Also users that constrain xgboost based on CUDA version may also expect this to carry through to libxgboost

So agree we should fix this. Though think the risk is relatively low. Started doing this in PR: https://github.com/conda-forge/xgboost-feedstock/pull/188

jakirkham commented 4 months ago

Second question, since all C++ includes are distributed, maybe it makes sense to distribute the CMake config files as well?

Also sorry for not answering the CMake question earlier

Thank you for opening a new issue to discuss: https://github.com/conda-forge/xgboost-feedstock/issues/189

Hopefully the answers there are helpful

conda-forge / xgboost-feedstock

Upgrade to XGBoost 2.1.0 #167