MaastrichtU-IDS / translator-openpredict

🔮🐍 A package to help serve predictions of biomedical concepts associations as Translator Reasoner API
https://openpredict.semanticscience.org
MIT License
11 stars 6 forks source link

Drug-Target predictions are broken #58

Open CaseyTa opened 2 weeks ago

CaseyTa commented 2 weeks ago

Describe the problem

TRAPI queries for drug-target predictions do not return any results. Tested and reproducible in dev and all ITRB environments. Using the example query suggested in the documentation

{
    "message": {
        "query_graph": {
            "edges": {"e01": {"object": "n1", "predicates": ["biolink:interacts_with"], "subject": "n0"}},
            "nodes": {
                "n0": {
                    "categories": ["biolink:Drug"],
                    "ids": ["PUBCHEM.COMPOUND:5329102", "PUBCHEM.COMPOUND:4039", "CHEMBL.COMPOUND:CHEMBL1431"]},
                "n1": {
                    "categories": ["biolink:Protein"],
                    "ids": ["UniProtKB:O75251"]
                }
            }
        }
    },
    "query_options": {"max_score": 1, "min_score": 0.1, "n_results": 10}
}

ITRB cloudwatch logs show the following:

[2024-08-23 22:32:26 +0000] [81] [INFO] 🔮⏳️ Getting predictions for: ['PUBCHEM.COMPOUND:5329102', 'PUBCHEM.COMPOUND:4039', 'CHEMBL.COMPOUND:CHEMBL1431'] | []
[2024-08-23 22:32:26 +0000] [81] [ERROR] Error getting the predictions: [Errno 2] No such file or directory: 'models/drug_target.pkl'
dagshub[bot] commented 2 weeks ago

Join the discussion on DagsHub!

CaseyTa commented 2 weeks ago

I tried looking into this a bit.

In my dev environment, the /app/models directory does not have drug_target.pkl. As a test, I manually downloaded the file from github into my running container: wget https://github.com/MaastrichtU-IDS/predict-drug-target/raw/2f2d9aa1591f1181ba07a5fff69aeb112e4ec371/models/drug_target.pkl

Then the error message becomes

Check drugs in Vector DB, or get SMILES: 100%|██████████| 1/1 [00:00<00:00,  3.17it/s]
2024-08-24 15:45:11,514 INFO: [embeddings:compute_target_embedding] Retrieved 4962 targets
2024-08-24 15:45:14,171 ERROR: [trapi_parser:resolve_trapi_query] Error getting the predictions: 'Booster' object has no attribute 'predict_proba'

I loaded the pickle file manually and see that it's a xgboost.Booster object which doesn't have a predict_proba method, but has predict. I also see that when training the model, looks like it was evaluated using the predict function, so I changed predict_proba to predict. Now I get:

Check drugs in Vector DB, or get SMILES: 100%|██████████| 1/1 [00:00<00:00,  3.19it/s]
2024-08-24 15:51:47,257 INFO: [embeddings:compute_target_embedding] Retrieved 4962 targets
2024-08-24 15:51:49,890 ERROR: [trapi_parser:resolve_trapi_query] Error getting the predictions: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>)

I then tried converting the DataFrame to DMatrix and got:

Check drugs in Vector DB, or get SMILES: 100%|██████████| 1/1 [00:00<00:00,  3.17it/s]
2024-08-24 16:10:59,814 INFO: [embeddings:compute_target_embedding] Retrieved 4962 targets
2024-08-24 16:11:02,804 ERROR: [trapi_parser:resolve_trapi_query] Error getting the predictions: name 'predicted' is not defined

I'm clearly going down a wrong path here. @micheldumontier Is anyone else available to continue troubleshooting?

micheldumontier commented 1 week ago

hi casey, i also was investigating this issue. i found a couple of issues. the first was this object is improperly saved, and doesn't comply with the expected interface (related to whether you save the weights of the booster or not). second, is that even when this was fixed, i found that the input dimension of the application doesn't match the training dataset. so i've resorted to rebuilding the prediction model and revising the code. still working on this