elephaint / pgbm

Probabilistic Gradient Boosting Machines
Apache License 2.0
141 stars 20 forks source link

Error messages when importing PGBM #17

Closed ivan-marroquin closed 1 year ago

ivan-marroquin commented 1 year ago

Describe the bug I have Python 3.8.10 on windows 10 machine. I installed Cuda 11.0. To install pytorch, I used this command: pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

Note that the pytorch installation seems to be fine since the command "torch.cuda.is_available()" returns True.

Then, I proceed with the installation of PGBM using pip.

When, I run this command "from pgbm import PGBM". I get the following error messages: Detected CUDA files, patching ldflags Emitting ninja build file C:\Users\imarroquin\AppData\Local\torch_extensions\torch_extensions\Cache\py38_cu113\split_decision\build.ninja... Building extension module split_decision... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\bin\nvcc --generate-dependencies-with-compile --dependency-output splitgain_kernel.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=split_decision -DTORCH_API_INCLUDE_EXTENSION_H -IC:\Temp\Python_3.8.10\lib\site-packages\torch\include -IC:\Temp\Python_3.8.10\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Temp\Python_3.8.10\lib\site-packages\torch\include\TH -IC:\Temp\Python_3.8.10\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\include" -IC:\Temp\Python_3.8.10\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_52,code=sm_52 -c C:\Temp\Python_3.8.10\lib\site-packages\pgbm\splitgain_kernel.cu -o splitgain_kernel.cuda.o FAILED: splitgain_kernel.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\bin\nvcc --generate-dependencies-with-compile --dependency-output splitgain_kernel.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=split_decision -DTORCH_API_INCLUDE_EXTENSION_H -IC:\Temp\Python_3.8.10\lib\site-packages\torch\include -IC:\Temp\Python_3.8.10\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Temp\Python_3.8.10\lib\site-packages\torch\include\TH -IC:\Temp\Python_3.8.10\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\include" -IC:\Temp\Python_3.8.10\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_52,code=sm_52 -c C:\Temp\Python_3.8.10\lib\site-packages\pgbm\splitgain_kernel.cu -o splitgain_kernel.cuda.o CreateProcess failed: The system cannot find the file specified. ninja: fatal: ReadFile: The handle is invalid.

Traceback (most recent call last): File "C:\Temp\Python_3.8.10\lib\site-packages\torch\utils\cpp_extension.py", line 1808, in _run_ninja_build subprocess.run( File "C:\Temp\Python_3.8.10\lib\subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "", line 1, in File "C:\Temp\Python_3.8.10\lib\site-packages\pgbm__init__.py", line 1, in from .pgbm import PGBM, PGBMRegressor File "C:\Temp\Python_3.8.10\lib\site-packages\pgbm\pgbm.py", line 41, in load(name="split_decision", File "C:\Temp\Python_3.8.10\lib\site-packages\torch\utils\cpp_extension.py", line 1202, in load return _jit_compile( File "C:\Temp\Python_3.8.10\lib\site-packages\torch\utils\cpp_extension.py", line 1425, in _jit_compile _write_ninja_file_and_build_library( File "C:\Temp\Python_3.8.10\lib\site-packages\torch\utils\cpp_extension.py", line 1537, in _write_ninja_file_and_build_library _run_ninja_build( File "C:\Temp\Python_3.8.10\lib\site-packages\torch\utils\cpp_extension.py", line 1824, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'split_decision'

To Reproduce Steps to reproduce the behavior:

  1. install pytorch as mentioned above
  2. install PGBM using pip command
  3. Open a DOS terminal, run Python followed by command "from pgbm import PGBM"

Expected behavior No error message(s) when import PGBM package

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context Add any other context about the problem here.

elephaint commented 1 year ago

Hi,

Thanks for reporting the issue. It seems Ninja can't find Python.

ivan-marroquin commented 1 year ago

Hi @elephaint

Thanks for your prompt reply. To answer your questions: 1) I have Build Tools for Visual Studio installed 2) From a DOS terminal the command "where cl" reports: C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64\cl.exe 3) I also added the following environment variables: LIB -> C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\lib\x64

Include -> C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\include

Path -> C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64

4) I am not using a virtual environment

Hope this helps,

Ivan

ivan-marroquin commented 1 year ago

Hi @elephaint

I tried the following:

Unfortunately, I still get the same error message when I run "from pgbm import PGBM".

Ivan

elephaint commented 1 year ago

Hi @ivan-marroquin,

It remains strange, and it seems Ninja can't find your Python installation. It seems like your Python installation is located in a temporary folder, so I'd suggest to use a virtual environment manager like Conda to setup Python.

ACommunist commented 1 year ago

Hi @ivan-marroquin, I think it's because of the version of pytorch or maybe the version of cl. When I use the latest version of pytorch, I face the same problem. But when I experiment with the following version of pytorch, I succeed in sagemaker studio lab whose os is linux. image image However, when I use the same setting on Windows, the problem still exists. I think maybe it's also related to the version of cl.

ivan-marroquin commented 1 year ago

Hi @elephaint and @ACommunist

Many thanks for your support and suggestions. On my case, I have to use Cuda 11 which in turn forces me to stay in latest compatible version of pytorch.

With respect to "cl", I tried both visual studio 2019 and 2022. In both occasions, I got the same error message.

Ivan

elephaint commented 1 year ago

@ivan-marroquin in the case of incompatibility issues I'd strongly suggest to take the virtual environment (i.e. conda) route, because then Pytorch will install its own CUDA toolkit version and you can still use your Windows generic CUDA toolkit for other projects. The steps would thus be to install Anaconda, open up an Anaconda shell and execute the following commands:

conda create -n new_env
conda activate new_env
conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
pip install pgbm

After that, it should work (you should install other dependencies to run the examples, e.g. matplotlib, separately).

Also, if you don't like Anaconda, use miniconda, which is much lighter and basically contains everything you need.

ivan-marroquin commented 1 year ago

Hi @elephaint

thanks for the suggestions!

Ivan

elephaint commented 1 year ago

@ivan-marroquin Did you get it to work?

ivan-marroquin commented 1 year ago

Hi @elephaint ,

I have to talk with developers first. The developing/testing of Python code is set in a way to make use of the generic Cuda install plus we do not use Anaconda (or miniconda). Thus, I have to rely on pip for the installation of packages.

Thanks for everything, Ivan

elephaint commented 1 year ago

@ivan-marroquin Ok, that's unfortunate. I've created an extension to Scikit-learn's HistGradientBoostingRegressor that enables PGBM too; this would solve your issues. I'll let you know when that becomes available (it's currently filed as a merge request with scikit-learn; ideally it will be part of scikit-learn, but if that is not possible I'll publish the same method through the pgbm package).

ivan-marroquin commented 1 year ago

Hi @elephaint

That is great news. Many thanks for sharing this info.

Ivan

slavakx commented 1 year ago

Hi @elephaint I have the same error on linux

I installed pytorch as you suggested

conda create -n new_env
conda activate new_env
conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
pip install pgbm

Then, I installed pgbm as

 pip install pgbm

Finally, I get this error during pgbm import

subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exist status 1