Open stavoltafunzia opened 7 months ago
Haven't been able to reproduce with CUDA 12.3, trying 12.4 now.
Still haven't reproduced it.
That's odd, how come that getting the last error is cudaErrorMemoryAllocation
.
Except for debian v.s. ubuntu, I have pretty much the same configuration:
OS: Debian 12, with Nvidia drivers 550.54.15 and Cuda 12.4
Hardware: RTX 4000 series card present
Apparently, there was another process using my GPU, specifically utilizing almost the entire VRam. After closing such application, the example above works. Nevertheless, I don't know if it should be considered a bug that any xgboost (even CPU-based) application crashes due to issues on the Cuda layer.
That makes sense, I will open a PR to workaround that. It's just XGBoost needs to know whether the data is from GPU or CPU, and we use CUDA runtime to obtain this information. As a result, there's a CUDA error when checking the input data.
I've recently upgraded to xgboost version 2.0.3 (Python), and since then I cannot use it anymore as keeps crashing. The following simple code fails to run:
And the error message shows the following traceback:
It surprises me that it throws an error related to Cuda, even though I'm trying to use only classic CPU xgboost. My configuration is as follows:
The code above used to run flawlessly in Python xgboost 1.7.x.
2024-04-09 update: it turns out that there was another process using my GPU, specifically utilizing almost the entire VRam. After closing such application, the example above works. Nevertheless, I don't know if it should be considered a bug that any xgboost (even CPU-based) application crashes due to issues on the Cuda layer. I leave this decision for the developers (though I personally think it should not happen).