elephaint / pgbm

Probabilistic Gradient Boosting Machines
Apache License 2.0
138 stars 20 forks source link

FAILED: splitgain_kernel.cuda.o #23

Closed kevindarby closed 11 months ago

kevindarby commented 11 months ago

Describe the bug nvcc error

To Reproduce Steps to reproduce the behavior:

from pgbm.sklearn import HistGradientBoostingRegressor, crps_ensemble from pgbm.torch import PGBMRegressor # If you want to use the Torch backend from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split from sklearn.datasets import fetch_california_housing import numpy as np

Using /home/algo/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/algo/.cache/torch_extensions/py39_cu118/split_decision/build.ninja... Building extension module split_decision... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] c++ -MMD -MF splitgain_cuda.o.d -DTORCH_EXTENSION_NAME=split_decision -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/TH -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/pgbm/torch/splitgain_cuda.cpp -o splitgain_cuda.o [2/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=split_decision -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/TH -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -std=c++17 -c /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/pgbm/torch/splitgain_kernel.cu -o splitgain_kernel.cuda.o FAILED: splitgain_kernel.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=split_decision -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/TH -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -std=c++17 -c /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/pgbm/torch/splitgain_kernel.cu -o splitgain_kernel.cuda.o /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/pybind11/cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type::type>::cast_op_type pybind11::detail::cast_op(make_caster&)’: /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/pybind11/cast.h:42:120: error: expected template-name before ‘<’ token 42 | return caster.operator typename make_caster::template cast_op_type(); | ^ /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/pybind11/cast.h:42:120: error: expected identifier before ‘<’ token /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/pybind11/cast.h:42:123: error: expected primary-expression before ‘>’ token 42 | return caster.operator typename make_caster::template cast_op_type(); | ^ /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/pybind11/cast.h:42:126: error: expected primary-expression before ‘)’ token 42 | return caster.operator typename make_caster::template cast_op_type(); | ^ ninja: build stopped: subcommand failed.

Expected behavior expect it to build

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information): alma8

nvcc 12.1 (.venv) algo@ch3li-fs03:/usr/local/cuda-12.1/bin$ ./nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0

gcc12 algo@ch3li-fs03:~/code/cqg/bts/spark/ml$ gcc --version gcc (GCC) 12.1.1 20220628 (Red Hat 12.1.1-3) Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Add any other context about the problem here.

kevindarby commented 11 months ago

sorry for the long paste, going to try with nvcc11.8 next.

kevindarby commented 11 months ago

no dice with 11

usr/local/cuda-11/include/crt/host_config.h:132:2: error: #error -- unsupported GNU version! gcc versions later than 11 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk. 132 | #error -- unsupported GNU version!

elephaint commented 11 months ago

Hi,

Thanks for reporting; some questions:

kevindarby commented 11 months ago

Hi, sure, I have pytorch cu118

cuda toolkit 12.1, upgrading to 12.2 (where they say this is fixed: https://github.com/pybind/pybind11/issues/4606)

SKLearn works, but I have a big dataset so I wanted to try the cuda backend. It's just from a notebook I'm working on

I'll try the 12.2 pybind fix, and if that doesn't work I'll try to see if there are nightly builds of pytorch cu122

kevindarby commented 11 months ago

cuda 12.2 fixed it

/home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/cuda/init.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() > 0 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-12' Using /home/algo/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... Emitting ninja build file /home/algo/.cache/torch_extensions/py39_cu118/split_decision/build.ninja... Building extension module split_decision... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] c++ -MMD -MF splitgain_cpu.o.d -DTORCH_EXTENSION_NAME=split_decision -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/TH -isystem /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/include/THC -isystem /usr/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/pgbm-2.1.1-py3.9-linux-x86_64.egg/pgbm/torch/splitgain_cpu.cpp -o splitgain_cpu.o [2/2] c++ splitgain_cpu.o -shared -L/home/algo/code/cqg/bts/spark/ml/.venv/lib64/python3.9/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o split_decision.so Loading extension module split_decision... Using /home/algo/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... No modifications detected for re-loaded extension module split_decision, skipping build step... Loading extension module split_decision...

elephaint commented 11 months ago

Ah, great, did not (yet) know about this issue of cuda 12.1 but good to hear that it is fixed. Does it work now for your dataset?

kevindarby commented 11 months ago

It does, this is a really nice tool.

There is a slight issue in installing from source on GCC12

I think it is cython is being strict about noexcept

so one has to change splitting.pyx and add the noexcept to compare_cat_infos

cdef int compare_cat_infos(const void a, const void b) noexcept nogil:

I can write this up separately if you'd like

kevindarby commented 11 months ago

P.S. have you ever thought about using a similar method to fit a (vine) copula that captures the relationship between the marginals?

It's my hunch that for some data, the copula itself is more stable than means / vars

elephaint commented 11 months ago

Regarding the cython part: yes, that would be great! I ported sklearn's implementation and have a bunch of tests running automatically, so could be that there are errors with certain versions of gcc. I guess a pull request is the best idea then.

Regarding last part: good point, and nice idea, no did not yet try this (tried many other variants to e.g. take away the loc, scale distribution assumption, none of which I could satisfactorily get to work :) There are more methods of doing this; it's a bit of a tradeoff between simplicity (i.e. high training speed and low storage requirements) and performance (i.e. distributional and point accuracy). I very much value the speed of GBMs, especially for large-scale settings, so our solution kind of leans towards being a bit more efficient rather than squeezing every inch of performance (I'm generally of the opinion that it's better to have a 90% good answer quick, rather than a 100% good answer slow, as quickly iterating over solutions is more valuable than just trying out a single solution)

kevindarby commented 11 months ago

Cool, I submitted a PR (https://github.com/elephaint/pgbm/pull/24) thanks.

I will go down the copula rabbit hole for a while and let you know if I come up with anything. Thanks!