closedloop-ai / cv19index

COVID-19 Vulnerability Index
http://cv19index.com
Other
88 stars 37 forks source link

Running the CV19 Index Predictor #3

Closed chesh27 closed 4 years ago

chesh27 commented 4 years ago

do_run(input_fpath, input_schema, model, output)

Traceback (most recent call last):

File "", line 1, in do_run(input_fpath, input_schema, model, output)

File "..\cv19index\predict.py", line 360, in do_run model = read_model(model_fpath)

File "..\cv19index\io.py", line 19, in read_model return pickle.load(fobj)

File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 981, in setstate _check_call(_LIB.XGBoosterLoadModelFromBuffer(handle, ptr, length))

File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 176, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError()))

XGBoostError: [15:34:02] C:\Jenkins\workspace\xgboost-win64_release_0.90\src\gbm\gbm.cc:20: Unknown gbm type

DaveDeCaprio commented 4 years ago

Hi, which model are you using?

Dave

From: Cheshta Dhingra notifications@github.com Sent: Thursday, March 19, 2020 2:41 PM To: closedloop-ai/cv19index cv19index@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [closedloop-ai/cv19index] Running the CV19 Index Predictor (#3)

do_run(input_fpath, input_schema, model, output)

Traceback (most recent call last):

File "", line 1, in do_run(input_fpath, input_schema, model, output)

File "..\cv19index\predict.py", line 360, in do_run model = read_model(model_fpath)

File "..\cv19index\io.py", line 19, in read_model return pickle.load(fobj)

File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 981, in setstate _check_call(_LIB.XGBoosterLoadModelFromBuffer(handle, ptr, length))

File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 176, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError()))

XGBoostError: [15:34:02] C:\Jenkins\workspace\xgboost-win64_release_0.90\src\gbm\gbm.cc:20: Unknown gbm type

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/closedloop-ai/cv19index/issues/3 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGNLOWTOIJQN5EIPNNS7HTRIJYK5ANCNFSM4LPSXGMQ . https://github.com/notifications/beacon/AAGNLOW7NU7UFCQ7UJ25D7LRIJYK5A5CNFSM4LPSXGM2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IWZDILQ.gif

DaveDeCaprio commented 4 years ago

Hi,

It might be worth upgrading to xgboost version 1.0.1 or greater as well. Not sure if that's the issue but we wrote this against version 1.0.1.

Thanks, Ben Tuttle

On Thu, Mar 19, 2020 at 3:19 PM dave@cizr.com wrote:

Hi, which model are you using?

Dave

From: Cheshta Dhingra notifications@github.com Sent: Thursday, March 19, 2020 2:41 PM To: closedloop-ai/cv19index cv19index@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [closedloop-ai/cv19index] Running the CV19 Index Predictor (#3)

do_run(input_fpath, input_schema, model, output)

Traceback (most recent call last):

File "", line 1, in do_run(input_fpath, input_schema, model, output)

File "..\cv19index\predict.py", line 360, in do_run model = read_model(model_fpath)

File "..\cv19index\io.py", line 19, in read_model return pickle.load(fobj)

File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 981, in setstate _check_call(_LIB.XGBoosterLoadModelFromBuffer(handle, ptr, length))

File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 176, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError()))

XGBoostError: [15:34:02] C:\Jenkins\workspace\xgboost-win64_release_0.90\src\gbm\gbm.cc:20: Unknown gbm type

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/closedloop-ai/cv19index/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGNLOWTOIJQN5EIPNNS7HTRIJYK5ANCNFSM4LPSXGMQ .

islamgalileo commented 4 years ago

I had similar issue when trying to load the xgboost model

Traceback (most recent call last):

File "", line 7, in do_run(input_fpath, input_schema, model, output)

File "/home/user1/.local/lib/python3.6/site-packages/cv19index/predict.py", line 360, in do_run model = read_model(model_fpath)

File "/home/user1/.local/lib/python3.6/site-packages/cv19index/io.py", line 19, in read_model return pickle.load(fobj)

File "/home/user1/.local/lib/python3.6/site-packages/xgboost/core.py", line 1093, in setstate _LIB.XGBoosterUnserializeFromBuffer(handle, ptr, length))

File "/usr/local/lib/python3.6/ctypes/init.py", line 361, in getattr func = self.getitem(name)

File "/usr/local/lib/python3.6/ctypes/init.py", line 366, in getitem func = self._FuncPtr((name_or_ordinal, self))

AttributeError: /usr/local/xgboost/libxgboost.so: undefined symbol: XGBoosterUnserializeFromBuffer

and I'm running that with 1.0.1

DaveDeCaprio commented 4 years ago

Ok, good to know that the version isn't the issue. Were you trying to load the "xgboost" model?

islamgalileo commented 4 years ago

Yes, just trying to run the example notebook using python kernel for the notebook

DaveDeCaprio commented 4 years ago

These are definitely some kind of issue with the XGBoost install. It looks like the C libraries aren't installed correctly.

DaveDeCaprio commented 4 years ago

I think we are going to recommend using the conda xgboost install. We will add in some directions on that.

chesh27 commented 4 years ago

I was also trying to run the XGBoost model in the Tutorial notebook. I upgraded XGBoost to version 1.0.1 and that seems to have resolved the issue. Thanks! Now if I want to the Logistic Regression model, do I simply need to replace the model reference in the Tutorial to the following? model = resource_filename("cv19index", "resources/logistic_regression/lr.p")

DaveDeCaprio commented 4 years ago

The logistic regression model isn't currently hooked up the same way. We are gong to address that with some new updates coming over the weekend.

chesh27 commented 4 years ago

Ok thank you! I ran the XGBoost model on my data and am seeing a lot of

['Diagnosis of Respiratory signs and symptoms in the previous 12 months'] in the "neg factors" column and not at all in the "pos factors" column.

Also seeing a lot of ['Age', 'Diagnosis of Neoplasm-related encounters in the previous 12 months', 'Diagnosis of Benign neoplasms in the previous 12 months'] in the "pos factors" column.

Shouldn't we expect to see Respiratory issues show up in the positive factors column, since those would increase the patients' risk for COVID19? Am I interpreting the results correctly?

DaveDeCaprio commented 4 years ago

In the output there should be a corresponding field called "pos_patient_values". This is an array that lines up with the pos_factors and gives you the actual value of the variable.

So if you see "Diagnosis of Respiratory signs and symptoms in the previous 12 months" as a negative factor, that should be paired with a value of "False". That means that the fact that a diagnosis wasn't seen contributed to a decrease in risk.

We will try to think about a more clear way to present this. In our application we have a UI that presents this more clearly, so we aren't as used to putting this all in a CSV.

DaveDeCaprio commented 4 years ago

We are going to switch to having two output files.

A prediction_summary.csv file will contain one row per prediction and will have the overall number. It will contain 3 columns:

The prediction_factors.csv file contains information on the factors driving each prediction. There will be multiple rows per prediction, one row for each factor. Each row will have:

islamgalileo commented 4 years ago

I think I found the root source of my issue: It was because there was an old version of xgboot installed 0.9.0 on the server and although my local folder had version 1.0.1. The core.py. file in xgboost tries to locate the libxgboost.so library file. It has a for loop for going over the paths and it doesn't exit the for loop after finding the correct libxgboost.so file. In my case what happened it find the library version of 1.0.1 then had overriden the file with another one it found 0.9.0 which cause the issue

DaveDeCaprio commented 4 years ago

Thanks. I'm going to close this then.

chesh27 commented 4 years ago

We are going to switch to having two output files.

A prediction_summary.csv file will contain one row per prediction and will have the overall number. It will contain 3 columns:

  • personId - The personId from the input data
  • percentile - Where this person fits into the overall population. 1 is the lowest risk and 100 is the highest risk
  • probability - The probability of the predicted outcome (respiratory failures)

The prediction_factors.csv file contains information on the factors driving each prediction. There will be multiple rows per prediction, one row for each factor. Each row will have:

  • personId - The personId from the input data
  • sign - 1 for positive factors (increased risk), 1 for negative factors (decreased risk)
  • rank - A number from 1 to 10 that ranks the multiple factors associated with a prediction. The most significant factor associated with a prediction is 1. 2 is second, etc.
  • factor_name- The name of the risk factor
  • factor_value - The value of the risk factor for this patient
  • factor_score - The score of this factor. Scores with larger magnitudes are more significant. These scores are a normalized version of SHAP scores.

Hi Dave, please let me know when this update is expected to be in production. Looking forward to having greater interpretability in the output, Thanks!

DaveDeCaprio commented 4 years ago

HI, we actually pushed a change last night that simplified the files. In the end, we decided against having two separate files, but made one file where the columns are laid out more clearly. All the columns now have simple values (rather than arrays) and the relevant values are next to each other.

See https://github.com/closedloop-ai/cv19index/blob/master/examples/xgboost/example_prediction.csv