NCATSTranslator / minihackathons

MIT License
5 stars 5 forks source link

BTE 598 error in Workflow D.6 query #115

Open vgardner-renci opened 3 years ago

vgardner-renci commented 3 years ago

https://arax.ncats.io/?source=ARS&id=0c5747a0-f16a-4007-88af-0f87770dcff7

image

vgardner-renci commented 3 years ago

{
  "callback": "http://localhost:8000/ars/api/messages/b3eef90a-487b-47a4-85f6-ae10e8b7b912",
  "message": {
    "query_graph": {
      "edges": {
        "e0": {
          "object": "n1",
          "subject": "n0"
        },
        "e1": {
          "object": "n2",
          "subject": "n1"
        },
        "e2": {
          "object": "n3",
          "subject": "n2"
        }
      },
      "nodes": {
        "n0": {
          "categories": [
            "biolink:ChemicalSubstance"
          ],
          "ids": [
            "CHEMBL.COMPOUND:CHEMBL1431"
          ]
        },
        "n1": {
          "categories": [
            "biolink:Protein"
          ]
        },
        "n2": {
          "categories": [
            "biolink:Protein"
          ]
        },
        "n3": {
          "categories": [
            "biolink:Protein"
          ],
          "ids": [
            "UniProtKB:P02794",
            "UniProtKB:P02792"
          ]
        }
      }
    }
  },
  "validation_result": {
    "message": "",
    "size": "2 kB",
    "status": "PASS",
    "version": "1.1.1"
  }
}```
andrewsu commented 3 years ago

Standard three-hop explain query. BTE planned fix tracked in biothings/BioThings_Explorer_TRAPI#112

aj95b commented 2 years ago

Is this (and https://github.com/NCATSTranslator/minihackathons/issues/108) expected to be resolved anytime soon? Adding ClinicalFinding (LOINC:LOINC:2276-4) for n3 and perhaps changing biolink:SmallMolecule to MolecularEntity for n0 (from the current form of this query in minihackathon repo) can uncover important results from Multiomics Wellness KP. @gglusman @rtroper

andrewsu commented 2 years ago

After some final tweaks, BTE is returning results now: https://arax.ncats.io/?r=63625138-dbc5-48b0-8ba1-9db46271a3a4

image

Some results appear to be coming back from the Multiomics Wellness KP. Can you confirm things are working as expected from your end?

image

aj95b commented 2 years ago

Were these results from the query above or the one here https://github.com/NCATSTranslator/minihackathons/blob/main/2021-12_demo/workflowD/D.6_metformin-ferritin.json (or a modification of this one as suggested in my previous comment)?

andrewsu commented 2 years ago

these were from the version of D.6 in the github repo. On your suggestion, I'm not sure how/why adding a ClinicalFinding on n3 (which is a biolink:Protein in the original query) would be useful, and n1 is a biolink:Protein not a biolink:SmallMolecule. So I'm not sure I understand your suggestion. Regardless, you are welcome to tweak the TRAPI query as you see fit and post it either to the ARS or to the BTE endpoint (https://api.bte.ncats.io/v1/query). The latter should return slightly faster...

sierra-moxon commented 2 years ago

@aj95b - if its not too much trouble, would you mind giving me a flavor of the kinds of answers to the question asked by this workflow, that you could provide? From a data modeling/conflation point of view, it would be helpful to get your perspective on a question that can be answered with both Ferritin as a protein and as a phenotype (phenotypic feature as a parent of clinical finding). thx in advance!

aj95b commented 2 years ago

@andrewsu I am sorry, I meant n0 not n1 to be changed into MolecularEntity (it seems to be returning results from your end regardless though). I am trying to run the query from scratch on my end, it is taking a while, I'll get back as soon as I can confirm. I am doing this because last time it returned an Error 598 on my end.

@sierra-moxon Yes, I am running the modification of the query (and just searching in the graph on my local) to get this.

aj95b commented 2 years ago

@andrewsu Confirming the results

aj95b commented 2 years ago

Once again, unable to see results from Wellness KP, for this and also query D.4 @gglusman @andrewsu @colleenXu @newgene @edeutsch

gglusman commented 2 years ago

@andrewsu @colleenXu @andrewsu @edeutsch Any idea what the issue is here? Thanks!

edeutsch commented 2 years ago

I'm afraid I don't really understand the thread here. I do note from the TRAPI above that this:

  "callback": "http://localhost:8000/ars/api/messages/b3eef90a-487b-47a4-85f6-ae10e8b7b912",

will not work if sent to our /asyncquey

andrewsu commented 2 years ago

So I just reran the query (the D.6 query in the github repo) against the asyncquery endpoint for BTE, and the results can be downloaded here: http://api.bte.ncats.io/v1/check_query_status/3leuzKk8av. There are 217 results, and 278 nodes and 774 edges in the knowledge_graph. The overall file is 40 MB, mostly because many of the edges come from text mining provider which has extensive edge attributes. We have been meaning to initiate a discussion with them (if we haven't already) about potentially reducing the verbosity in edge attributes... I hope that's helpful?

aj95b commented 2 years ago

I ran the query, after hours BTE status code still says Running/200. Basically, Multiomics Wellness has edges that should be returned as results via BTE too.

andrewsu commented 2 years ago

@aj95b yes, there are some issues right now that lead to timeout errors. It sometimes finishes just fine (like the job linked in my last comment), but often not... We are looking into it...

For the successful job linked above, in the logs I can see that the multiomics wellness KP is being called and BTE is retrieving results (e.g., https://biothings.ncats.io/multiomics_wellness_kp/query?q=subject.id:%22KEGG.COMPOUND:C07151%22%20AND%20predicate.type:correlated_with%20AND%20object.type:Protein%20AND%20_exists_:object.UniProtKB), but those results apparently are not included in the final results assembly. Can you provide more info on what paths you're expecting to see?

brettasmi commented 2 years ago

FWIW, during recent discussions with the NCATS folks running the demo and the recent minihackathon sessions, we've decided to mostly focus on query D.4 and some derivatives. As such, I don't think D.6 will be run during the demo.

That said, we have been testing a similar query to link tryptophan and kynurenine in a query similar to this one (it's the green edge in this figure). It's not in the repo yet, but the current state is as follows:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        }
      },
      "nodes": {
        "n00": {
          "ids": ["DRUGBANK:DB00150"],
          "categories": ["biolink:SmallMolecule"]
        },
        "n01": {
          "ids": ["KEGG.COMPOUND:C02700"],
          "categories": ["biolink:SmallMolecule"]
        }
      }
    }
  }
}

If you'd like to optimize your investigation and dev time, I'd suggest you invest there. We haven't seen much come back from the clinical (or multiomic) KPs, so it the query may be dropped altogether.

aj95b commented 2 years ago

@brettasmi There are results from Multiomics Wellness via ARAX for query D.6, there could be even more results (not saying they would all be relevant) but there is a limitation on one of the interim edges that were expected to be returned via BTE