Open CaseyTa opened 2 weeks ago
I tried looking into this a bit.
In my dev environment, the /app/models
directory does not have drug_target.pkl
. As a test, I manually downloaded the file from github into my running container:
wget https://github.com/MaastrichtU-IDS/predict-drug-target/raw/2f2d9aa1591f1181ba07a5fff69aeb112e4ec371/models/drug_target.pkl
Then the error message becomes
Check drugs in Vector DB, or get SMILES: 100%|██████████| 1/1 [00:00<00:00, 3.17it/s]
2024-08-24 15:45:11,514 INFO: [embeddings:compute_target_embedding] Retrieved 4962 targets
2024-08-24 15:45:14,171 ERROR: [trapi_parser:resolve_trapi_query] Error getting the predictions: 'Booster' object has no attribute 'predict_proba'
I loaded the pickle file manually and see that it's a xgboost.Booster
object which doesn't have a predict_proba
method, but has predict
. I also see that when training the model, looks like it was evaluated using the predict
function, so I changed predict_proba to predict
. Now I get:
Check drugs in Vector DB, or get SMILES: 100%|██████████| 1/1 [00:00<00:00, 3.19it/s]
2024-08-24 15:51:47,257 INFO: [embeddings:compute_target_embedding] Retrieved 4962 targets
2024-08-24 15:51:49,890 ERROR: [trapi_parser:resolve_trapi_query] Error getting the predictions: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>)
I then tried converting the DataFrame to DMatrix and got:
Check drugs in Vector DB, or get SMILES: 100%|██████████| 1/1 [00:00<00:00, 3.17it/s]
2024-08-24 16:10:59,814 INFO: [embeddings:compute_target_embedding] Retrieved 4962 targets
2024-08-24 16:11:02,804 ERROR: [trapi_parser:resolve_trapi_query] Error getting the predictions: name 'predicted' is not defined
I'm clearly going down a wrong path here. @micheldumontier Is anyone else available to continue troubleshooting?
hi casey, i also was investigating this issue. i found a couple of issues. the first was this object is improperly saved, and doesn't comply with the expected interface (related to whether you save the weights of the booster or not). second, is that even when this was fixed, i found that the input dimension of the application doesn't match the training dataset. so i've resorted to rebuilding the prediction model and revising the code. still working on this
Describe the problem
TRAPI queries for drug-target predictions do not return any results. Tested and reproducible in dev and all ITRB environments. Using the example query suggested in the documentation
ITRB cloudwatch logs show the following: