GoogleCloudPlatform / cloud-profiler-python

Stackdriver Profiler Python agent is a tool that continuously gathers CPU usage information from Python applications
Apache License 2.0
27 stars 23 forks source link

Python 3.12 `xgboost.core.XGBoostError: Invalid Parameter format for nthread expect int but value='-1'` when `DMatrix` used with `import googlecloudprofiler`. #144

Open rtb-zla-karma opened 4 months ago

rtb-zla-karma commented 4 months ago

This issue was originally posted in xgboost repo https://github.com/dmlc/xgboost/issues/10224 .

Hi

I have a very peculiar error which happened when I've updated versions of Python and libs in project I'm working on.

Minimal example to reproduce the case is this:

# file.py
import googlecloudprofiler
from xgboost import DMatrix

DMatrix([[]])
print("works")
# requirements.txt
xgboost==2.0.3
google-cloud-profiler==4.1.0
#
numpy==1.26.4
scipy==1.13.0
google-api-python-client==2.125.0
google-auth==2.29.0
google-auth-httplib2==0.2.0
protobuf==4.25.3
requests==2.31.0
#
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
google-api-core==2.18.0
httplib2==0.22.0
idna==3.6
pyasn1==0.6.0
pyasn1_modules==0.4.0
pyparsing==3.1.2
rsa==4.9
uritemplate==4.1.1
urllib3==2.2.1

Python 3.12.2

Install with

pip install -r requirements.txt --no-deps

Run with

python file.py

Results in

Traceback (most recent call last):
  File "/project/path/file.py", line 4, in <module>
    DMatrix([[]])
  File "/venv/path/lib/python3.12/site-packages/xgboost/core.py", line 730, in inner_f
    return func(**kwargs)
           ^^^^^^^^^^^^^^
  File "/venv/path/lib/python3.12/site-packages/xgboost/core.py", line 857, in __init__
    handle, feature_names, feature_types = dispatch_data_backend(
                                           ^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/path/lib/python3.12/site-packages/xgboost/data.py", line 1081, in dispatch_data_backend
    return _from_list(data, missing, threads, feature_names, feature_types)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/path/lib/python3.12/site-packages/xgboost/data.py", line 1011, in _from_list
    return _from_numpy_array(array, missing, n_threads, feature_names, feature_types)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/path/lib/python3.12/site-packages/xgboost/data.py", line 207, in _from_numpy_array
    _check_call(
  File "/venv/path/lib/python3.12/site-packages/xgboost/core.py", line 282, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: Invalid Parameter format for nthread expect int but value='-1'

To "solve" the problem remove import googlecloudprofiler from file.py. I really have no idea why just importing the lib causes this problem; it would make more sense after googlecloudprofiler.start is called.

Moreover the code works for xgboost=1.7.6 and fails since xgboost=2.0.0.

Maintainer of xgboost mentioned

loading the _profiler.cpython-312-x86_64-linux-gnu.so inside google profiler extension causes the error

https://github.com/dmlc/xgboost/issues/10224#issuecomment-2077751030

This is why I've opened issue here.