AlexanderKroll / ESP_prediction_function

MIT License
12 stars 5 forks source link

xgboost error #3

Closed ryamy closed 1 year ago

ryamy commented 1 year ago

Thanks for your excellent work, I am interested in ESP prediction. I tried to execute Tutorial ESP prediction.ipynb with no modification and following error was produced in cell[2]. How can I handle this error? I would appreciate any comments.

Step 1/3: Calculating numerical representations for all metabolites.
Step 2/3: Calculating numerical representations for all enzymes.
.....2(a) Loading ESM-1b model.
.....2(b) Loading model parameters for task-specific model.
.....2(c) Calculating enzyme representations.
Step 3/3: Making predictions for ESP.
(3, 100) (3, 1280)
[12:46:17] WARNING: ../src/gbm/gbtree.cc:350: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
---------------------------------------------------------------------------
XGBoostError                              Traceback (most recent call last)
Cell In[3], line 8
      1 substrates = ["C00069",
      2               "C00002",
      3              "C00002"]
      4 enzymes = ["MARLPFYLLVISTLLLVVTADSFLARPPSSSFLHALSNKRASTPASLPSCSLDFLLQTRGGTAANAATTALPTSALVERKGGAAVALEGGKTLWEKSKVWVFIGLWYFFNVAFNIYNKKVLNALPLPWTVSIAQLGLGALYTMFLWLVRARKMPTIAAPEMKTLSILGVLHAVSHITAITSLGAGAVSFTHIVKSAEPFFSAVFAGLFFGQFFSLPVYAALIPVVSGVAYASLKELTFTWLSFWCAMASNVVCAARGVVVKGMMGGKPTQSKDLTSSNMYSVLTILAALVLLPFGALVEGPGLHAAWKAAAAHPSLTNGGTELAKYLVYSGLTFFLYNEVAFAALESLHPISHAVANTIKRVVIIVVSVLVFRNPMSTQSIIGSSTAVIGVLLYSLAKHYCK",
      5            "MKGRRRRRREYCKFALLLVLYTLVLLLVPSVLDGGRDGDKGAEHCPGLQRSLGVWSLEAAAAGEREQGAEARAAEEGGANQSPRFPSNLSGAVGEAVSREKQHIYVHATWRTGSSFLGELFNQHPDVFYLYEPMWHLWQALYPGDAESLQGALRDMLRSLFRCDFSVLRLYAPPGDPAARAPDTANLTTAALFRWRTNKVICSPPLCPGAPRARAEVGLVEDTACERSCPPVAIRALEAECRKYPVVVIKDVRLLDLGVLVPLLRDPGLNLKVVQLFRDPRAVHNSRLKSRQGLLRESIQVLRTRQRGDRFHRVLLAHGVGARPGGQSRALPAAPRADFFLTGALEVICEAWLRDLLFARGAPAWLRRRYLRLRYEDLVRQPRAQLRRLLRFSGLRALAALDAFALNMTRGAAYGADRPFHLSARDAREAVHAWRERLSREQVRQVEAACAPAMRLLAYPRSGEEGDAEQPREGETPLEMDADGAT",
      6           "MASNPDRGEILLTELQVDSRPLPFSENVSAVQKLDFSDTIVQQKLDDVKDRIKREIRKELKIKEGAENLRKVTTDKKNLAYVDNILKKSNKKLEELHHKLQELNAHIVVSDPEDYTDCPRTPDTPNSDSRSSTSNNRRLMALQKQLDIELKVKQGAENMIQMYSNGPSKDRKLHGTAQQLLQDNKTKIEVIRMHILQAVLTNELAFDNAKPVISPLELRNGRIIEHHFRIEFAVAEGAKNVMKLLGSGKVTDRKALSEAQARFNESSQKLDLLKYSLEQRLNELPKNHPKSSVVIEELSLVASPTLSPRQSMLSTQNQYSTLSKPAALTGTLEVRLWGAKISWENVPGRSKATSVALPGWSPSENRSSFMSRTSKSKSGSSRNLLKTDDLSNDVCAVLKLDNTVVGQTIWKPISNQSWDQKFTLELDRSRELEISVYWRDWRSLCAVKFLRLEDFLDNQRHGMALYLEPQGTLFAEVTFFNPVIERRPKLQRQKKIFSKQQGKTFLRAPQMNINIATWGRLVRRAIPTVNHSGTFSPQTPVPATVPVVDARTPELAPPASDSTVTKLDFDLEPEAPPAPPRASSLGEIDDSSELRVLDIPGQGSETVFDIENDRNNMRPKSKSEYELNIPDSSRSCWSVGELEDKRSQQRFQFNLQDFRCCAVLGRGHFGKVLLAEYKHTNEMFAIKALKKGDIVARDEVDSLMCEKRIFETVNSVRHPFLVNLFACFQTKEHVCFVMEYAAGGDLMMHIHTDVFSEPRAVFYAACVVLGLQYLHEHKIVYRDLKLDNLLLDTEASVKIADFGLCKEGMGYGDRTSTFCGTPEFLAPEVLTETSYTRAVDWWGLGVLIYEMLVGESPFPGDDEEEVFDSIVNDEVRYPRFLSTEAISIMRRLLRRNPERRLGAGEKDAEDVKKHPFFRLTDWSALLDKKVKPPFVPTIRGREDVSNFDDEFTSEAPILTPPREPRILLEEEQEMFRDFDYVADWC",
      7           ]
----> 8 df = ESP_predicton(substrate_list = substrates,
      9                enzyme_list = enzymes)
     10 df

File ~/ESP_prediction_function/code/ES_prediction.py:34, in ESP_predicton(substrate_list, enzyme_list)
     32     if len(df_ES_valid) > 0:
     33         X = calculate_xgb_input_matrix(df = df_ES_valid)
---> 34         ESs = predict_ES(X)
     35         df_ES_valid["Prediction"] = ESs
     37     df_ES = pd.concat([df_ES_valid, df_ES_invalid], ignore_index = True)

File ~/ESP_prediction_function/code/ES_prediction.py:72, in predict_ES(X)
     71 def predict_ES(X):
---> 72     bst = pickle.load(open(join("..", "data", "saved_models", "xgboost", "xgboost_model_production_mode_gnn_esm1b_ts.dat"), "rb"))
     73     feature_names =  ["GNN rep_" + str(i) for i in range(100)]
     74     feature_names = feature_names + ["ESM1b_" + str(i) for i in range(1280)]

File /opt/conda/envs/espp/lib/python3.8/site-packages/xgboost/core.py:1087, in Booster.__setstate__(self, state)
   1085     length = c_bst_ulong(len(buf))
   1086     ptr = (ctypes.c_char * len(buf)).from_buffer(buf)
-> 1087     _check_call(
   1088         _LIB.XGBoosterUnserializeFromBuffer(handle, ptr, length))
   1089     state['handle'] = handle
   1090 self.__dict__.update(state)

File /opt/conda/envs/espp/lib/python3.8/site-packages/xgboost/core.py:189, in _check_call(ret)
    178 """Check the return value of C API call
    179 
    180 This function will raise exception when error occurs.
   (...)
    186     return value from API calls
    187 """
    188 if ret != 0:
--> 189     raise XGBoostError(py_str(_LIB.XGBGetLastError()))

XGBoostError: [12:46:17] ../src/tree/tree_updater.cc:20: Unknown tree updater grow_gpu_hist
Stack trace:
  [bt] (0) /opt/conda/envs/espp/lib/libxgboost.so(xgboost::TreeUpdater::Create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, xgboost::GenericParameter const*)+0x3d5) [0x7fcc774cea25]
  [bt] (1) /opt/conda/envs/espp/lib/libxgboost.so(xgboost::gbm::GBTree::LoadConfig(xgboost::Json const&)+0x28c) [0x7fcc773bf7ec]
  [bt] (2) /opt/conda/envs/espp/lib/libxgboost.so(+0x1d4e67) [0x7fcc773e0e67]
  [bt] (3) /opt/conda/envs/espp/lib/libxgboost.so(+0x1e1d4b) [0x7fcc773edd4b]
  [bt] (4) /opt/conda/envs/espp/lib/libxgboost.so(XGBoosterUnserializeFromBuffer+0x5e) [0x7fcc772ad55e]
  [bt] (5) /opt/conda/envs/espp/lib/python3.8/lib-dynload/../../libffi.so.8(+0x6a4a) [0x7fcd0aaf1a4a]
  [bt] (6) /opt/conda/envs/espp/lib/python3.8/lib-dynload/../../libffi.so.8(+0x5fea) [0x7fcd0aaf0fea]
  [bt] (7) /opt/conda/envs/espp/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(_ctypes_callproc+0x377) [0x7fcd0ad4ffc7]
  [bt] (8) /opt/conda/envs/espp/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x8fb4) [0x7fcd0ad45fb4]
(espp) user@hostname:~/ESP_prediction_function$ conda list | grep -e pandas -e torch -e numpy -e fair-esm -e rdkit -e xgboost
_py-xgboost-mutex         2.0                       cpu_0    conda-forge
fair-esm                  0.4.0                    pypi_0    pypi
libxgboost                1.3.3                h9c3ff4c_2    conda-forge
numpy                     1.23.1                   pypi_0    pypi
pandas                    1.3.1                    pypi_0    pypi
py-xgboost                1.3.3            py38h578d9bd_2    conda-forge
rdkit                     2022.09.5        py38h6600b1c_0    conda-forge
torch                     1.12.1+cu113             pypi_0    pypi
torchaudio                0.12.1+cu113             pypi_0    pypi
torchvision               0.13.1+cu113             pypi_0    pypi

(espp) user@hostname:~/ESP_prediction_function$ nvidia-smi
Tue Oct 17 13:04:25 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P0    58W / 400W |   1625MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     26207      C   ...a/envs/espp/bin/python3.8     1623MiB |
+-----------------------------------------------------------------------------+
ryamy commented 1 year ago

I found it seems that xgboost can not access to gpu.

(espp) user@hostname:~/ESP_prediction_function$ python
Python 3.8.18 | packaged by conda-forge | (default, Oct 10 2023, 15:44:36)
[GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xgboost as xgb
>>> import pandas as pd
>>> import numpy
>>> import numpy as np
>>> data_url = "http://lib.stat.cmu.edu/datasets/boston"
>>> raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
>>> data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
>>> target = raw_df.values[1::2, 2]
>>> params = {'tree_method': 'hist', 'max_depth': 3, 'learning_rate': 0.1}
>>> dtrain = xgb.DMatrix(data, target)
/opt/conda/envs/espp/lib/python3.8/site-packages/xgboost/data.py:104: UserWarning: Use subset (sliced data) of np.ndarray is not recommended because it will generate extra copies and increase memory consumption
  warnings.warn(
>>> xgb.train(params, dtrain, evals=[(dtrain, "train")])
[0] train-rmse:21.60208
[1] train-rmse:19.55555
[2] train-rmse:17.71453
[3] train-rmse:16.06071
[4] train-rmse:14.57054
[5] train-rmse:13.23501
[6] train-rmse:12.03721
[7] train-rmse:10.94917
[8] train-rmse:9.98378
[9] train-rmse:9.10604
<xgboost.core.Booster object at 0x7f90dd0de820>
>>> params = {'tree_method': 'gpu_hist', 'max_depth': 3, 'learning_rate': 0.1}
>>>
>>> xgb.train(params, dtrain, evals=[(dtrain, "train")])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/envs/espp/lib/python3.8/site-packages/xgboost/training.py", line 227, in train
    bst = _train_internal(params, dtrain,
  File "/opt/conda/envs/espp/lib/python3.8/site-packages/xgboost/training.py", line 102, in _train_internal
    bst.update(dtrain, i, obj)
  File "/opt/conda/envs/espp/lib/python3.8/site-packages/xgboost/core.py", line 1280, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "/opt/conda/envs/espp/lib/python3.8/site-packages/xgboost/core.py", line 189, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [02:02:19] ../src/gbm/../common/common.h:156: XGBoost version not compiled with GPU support.
Stack trace:
  [bt] (0) /opt/conda/envs/espp/lib/libxgboost.so(+0x9f738) [0x7f90d4c47738]
  [bt] (1) /opt/conda/envs/espp/lib/libxgboost.so(xgboost::gbm::GBTree::ConfigureUpdaters()+0x106) [0x7f90d4d44886]
  [bt] (2) /opt/conda/envs/espp/lib/libxgboost.so(xgboost::gbm::GBTree::Configure(std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)+0x241) [0x7f90d4d597c1]
  [bt] (3) /opt/conda/envs/espp/lib/libxgboost.so(+0x1e0c5e) [0x7f90d4d88c5e]
  [bt] (4) /opt/conda/envs/espp/lib/libxgboost.so(+0x1cfdbd) [0x7f90d4d77dbd]
  [bt] (5) /opt/conda/envs/espp/lib/libxgboost.so(XGBoosterUpdateOneIter+0x64) [0x7f90d4c4cb04]
  [bt] (6) /opt/conda/envs/espp/lib/python3.8/lib-dynload/../../libffi.so.8(+0x6a4a) [0x7f90faa13a4a]
  [bt] (7) /opt/conda/envs/espp/lib/python3.8/lib-dynload/../../libffi.so.8(+0x5fea) [0x7f90faa12fea]
  [bt] (8) /opt/conda/envs/espp/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(_ctypes_callproc+0x377) [0x7f90faa2bfc7]

>>>

So I uninstalled xgboost which was installed via conda install -c conda-forge py-xgboost=1.3.3(as following README.md). Then I installed xgboost via pip install xgboost resulted in scipy-1.10.1 and xgboost-2.0.0 installation.

After that Tutorial ESP prediction.ipynb works fine in my environment. I hope this helps for someone facing similar issues.