Closed chesh27 closed 4 years ago
Hi, which model are you using?
Dave
From: Cheshta Dhingra notifications@github.com Sent: Thursday, March 19, 2020 2:41 PM To: closedloop-ai/cv19index cv19index@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [closedloop-ai/cv19index] Running the CV19 Index Predictor (#3)
do_run(input_fpath, input_schema, model, output)
Traceback (most recent call last):
File "", line 1, in do_run(input_fpath, input_schema, model, output)
File "..\cv19index\predict.py", line 360, in do_run model = read_model(model_fpath)
File "..\cv19index\io.py", line 19, in read_model return pickle.load(fobj)
File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 981, in setstate _check_call(_LIB.XGBoosterLoadModelFromBuffer(handle, ptr, length))
File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 176, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError()))
XGBoostError: [15:34:02] C:\Jenkins\workspace\xgboost-win64_release_0.90\src\gbm\gbm.cc:20: Unknown gbm type
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/closedloop-ai/cv19index/issues/3 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGNLOWTOIJQN5EIPNNS7HTRIJYK5ANCNFSM4LPSXGMQ . https://github.com/notifications/beacon/AAGNLOW7NU7UFCQ7UJ25D7LRIJYK5A5CNFSM4LPSXGM2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IWZDILQ.gif
Hi,
It might be worth upgrading to xgboost version 1.0.1 or greater as well. Not sure if that's the issue but we wrote this against version 1.0.1.
Thanks, Ben Tuttle
On Thu, Mar 19, 2020 at 3:19 PM dave@cizr.com wrote:
Hi, which model are you using?
Dave
From: Cheshta Dhingra notifications@github.com Sent: Thursday, March 19, 2020 2:41 PM To: closedloop-ai/cv19index cv19index@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [closedloop-ai/cv19index] Running the CV19 Index Predictor (#3)
do_run(input_fpath, input_schema, model, output)
Traceback (most recent call last):
File "", line 1, in do_run(input_fpath, input_schema, model, output)
File "..\cv19index\predict.py", line 360, in do_run model = read_model(model_fpath)
File "..\cv19index\io.py", line 19, in read_model return pickle.load(fobj)
File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 981, in setstate _check_call(_LIB.XGBoosterLoadModelFromBuffer(handle, ptr, length))
File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 176, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError()))
XGBoostError: [15:34:02] C:\Jenkins\workspace\xgboost-win64_release_0.90\src\gbm\gbm.cc:20: Unknown gbm type
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/closedloop-ai/cv19index/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGNLOWTOIJQN5EIPNNS7HTRIJYK5ANCNFSM4LPSXGMQ .
I had similar issue when trying to load the xgboost model
Traceback (most recent call last):
File "
File "/home/user1/.local/lib/python3.6/site-packages/cv19index/predict.py", line 360, in do_run model = read_model(model_fpath)
File "/home/user1/.local/lib/python3.6/site-packages/cv19index/io.py", line 19, in read_model return pickle.load(fobj)
File "/home/user1/.local/lib/python3.6/site-packages/xgboost/core.py", line 1093, in setstate _LIB.XGBoosterUnserializeFromBuffer(handle, ptr, length))
File "/usr/local/lib/python3.6/ctypes/init.py", line 361, in getattr func = self.getitem(name)
File "/usr/local/lib/python3.6/ctypes/init.py", line 366, in getitem func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/local/xgboost/libxgboost.so: undefined symbol: XGBoosterUnserializeFromBuffer
and I'm running that with 1.0.1
Ok, good to know that the version isn't the issue. Were you trying to load the "xgboost" model?
Yes, just trying to run the example notebook using python kernel for the notebook
These are definitely some kind of issue with the XGBoost install. It looks like the C libraries aren't installed correctly.
I think we are going to recommend using the conda xgboost install. We will add in some directions on that.
I was also trying to run the XGBoost model in the Tutorial notebook. I upgraded XGBoost to version 1.0.1 and that seems to have resolved the issue. Thanks! Now if I want to the Logistic Regression model, do I simply need to replace the model reference in the Tutorial to the following? model = resource_filename("cv19index", "resources/logistic_regression/lr.p")
The logistic regression model isn't currently hooked up the same way. We are gong to address that with some new updates coming over the weekend.
Ok thank you! I ran the XGBoost model on my data and am seeing a lot of
['Diagnosis of Respiratory signs and symptoms in the previous 12 months'] in the "neg factors" column and not at all in the "pos factors" column.
Also seeing a lot of ['Age', 'Diagnosis of Neoplasm-related encounters in the previous 12 months', 'Diagnosis of Benign neoplasms in the previous 12 months'] in the "pos factors" column.
Shouldn't we expect to see Respiratory issues show up in the positive factors column, since those would increase the patients' risk for COVID19? Am I interpreting the results correctly?
In the output there should be a corresponding field called "pos_patient_values". This is an array that lines up with the pos_factors and gives you the actual value of the variable.
So if you see "Diagnosis of Respiratory signs and symptoms in the previous 12 months" as a negative factor, that should be paired with a value of "False". That means that the fact that a diagnosis wasn't seen contributed to a decrease in risk.
We will try to think about a more clear way to present this. In our application we have a UI that presents this more clearly, so we aren't as used to putting this all in a CSV.
We are going to switch to having two output files.
A prediction_summary.csv file will contain one row per prediction and will have the overall number. It will contain 3 columns:
The prediction_factors.csv file contains information on the factors driving each prediction. There will be multiple rows per prediction, one row for each factor. Each row will have:
I think I found the root source of my issue: It was because there was an old version of xgboot installed 0.9.0 on the server and although my local folder had version 1.0.1. The core.py. file in xgboost tries to locate the libxgboost.so library file. It has a for loop for going over the paths and it doesn't exit the for loop after finding the correct libxgboost.so file. In my case what happened it find the library version of 1.0.1 then had overriden the file with another one it found 0.9.0 which cause the issue
Thanks. I'm going to close this then.
We are going to switch to having two output files.
A prediction_summary.csv file will contain one row per prediction and will have the overall number. It will contain 3 columns:
- personId - The personId from the input data
- percentile - Where this person fits into the overall population. 1 is the lowest risk and 100 is the highest risk
- probability - The probability of the predicted outcome (respiratory failures)
The prediction_factors.csv file contains information on the factors driving each prediction. There will be multiple rows per prediction, one row for each factor. Each row will have:
- personId - The personId from the input data
- sign - 1 for positive factors (increased risk), 1 for negative factors (decreased risk)
- rank - A number from 1 to 10 that ranks the multiple factors associated with a prediction. The most significant factor associated with a prediction is 1. 2 is second, etc.
- factor_name- The name of the risk factor
- factor_value - The value of the risk factor for this patient
- factor_score - The score of this factor. Scores with larger magnitudes are more significant. These scores are a normalized version of SHAP scores.
Hi Dave, please let me know when this update is expected to be in production. Looking forward to having greater interpretability in the output, Thanks!
HI, we actually pushed a change last night that simplified the files. In the end, we decided against having two separate files, but made one file where the columns are laid out more clearly. All the columns now have simple values (rather than arrays) and the relevant values are next to each other.
See https://github.com/closedloop-ai/cv19index/blob/master/examples/xgboost/example_prediction.csv
do_run(input_fpath, input_schema, model, output)
Traceback (most recent call last):
File "", line 1, in
do_run(input_fpath, input_schema, model, output)
File "..\cv19index\predict.py", line 360, in do_run model = read_model(model_fpath)
File "..\cv19index\io.py", line 19, in read_model return pickle.load(fobj)
File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 981, in setstate _check_call(_LIB.XGBoosterLoadModelFromBuffer(handle, ptr, length))
File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 176, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError()))
XGBoostError: [15:34:02] C:\Jenkins\workspace\xgboost-win64_release_0.90\src\gbm\gbm.cc:20: Unknown gbm type