cannin / enhance_nlp_interaction_network_gsoc2020

3 stars 4 forks source link

Add Count of INDRA Statements for Individual Terms #10

Open cannin opened 4 years ago

cannin commented 4 years ago

Add another column INDRA_QUERY_TERM_STATEMENT_COUNT, use the following example code:

# GET QUERY GROUNDING ----
import requests 
from urllib.parse import urljoin

grounding_service_url = 'http://grounding.indra.bio/'

txt = 'BRAF'
txt = 'topotecan'

resp = requests.post(urljoin(grounding_service_url, 'ground'), json={'text': txt})
grounding_results = resp.json()
grounding_results 

# TODO: Test if grounding_results has entries
term_id = grounding_results[0]['term']['id']
term_db = grounding_results[0]['term']['db']
term = term_id + '@' + term_db
term

# Get statements for query term 
out = indra_db_rest.get_statements(agents=[term])
out.statements
len(out.statements)
cannin commented 4 years ago

The harder challenge: Only return back statements from specific source_apis (e.g., reach). Like this one:

        "evidence": [
            {
                "source_api": "reach",
                "pmid": "28972042",
                "text": "TMCO1 dysregulates cell cycle progression via suppression of the AKT pathway, and S60 of the TMCO1 protein is crucial for its tumor suppressor roles.",
                "annotations": {
                    "found_by": "Negative_activation_syntax_1_verb",
                    "agents": {
                        "raw_text": [
                            "TMCO1",
                            "cell cycle"
                        ]
                    },

I converted the statements to_json with 'from indra.statements.statements import stmts_to_json'. We might try to submit a PR related to this.

cannin commented 4 years ago

You might want to message INDRA team to see if they have this already somewhere; some function to filter statements based on some properties; it should be a pretty independent function.

I have tackled similar challenges with jsonpath (https://github.com/h2non/jsonpath-ng) not sure if it will work here. You might want to mention this as well; INDRA might not want the extra dependency. Example code:

import json
from jsonpath_ng import jsonpath
from jsonpath_ng.ext import parse

def get_jsonpath(json_file, json_str, jsonpath_expr_str): 
    if json_file is None: 
        dat = json.loads(json_str)
    else: 
        with open(json_file) as f:
            dat = json.load(f)

    jsonpath_expr = parse(jsonpath_expr_str)

    results = jsonpath_expr.find(dat)

    results_list = []

    for match in results:
        results_list.append(match.value)

    return(results_list)

if __name__ == "__main__":

    # json_file = 'covid19_model_2020-03-22-03-16-47.json'
    # jsonpath_expr_str = "$..text_refs"
    # jsonpath_expr_str = "$..stmts[?(@.belief == 1)]"
    # jsonpath_expr_str = "$..stmts[?(@.stmt.type == 'IncreaseAmount')]"
    # jsonpath_expr_str = "$..stmts[?(@.stmt.obj.db_refs.UP == 'P16278')]"
    # jsonpath_expr_str = "$..stmts[?(@.stmt.evidence[*].text_refs.PMCID == 'PMC331007')]"

    json_file = None
    json_str = '[{"id": "a", "foo": [{"baz": 1}, {"baz": 2}]}, {"id": "b", "foo": [{"baz": 3}, {"baz": 4}]}]'
    jsonpath_expr_str = '$..foo[*].baz'
    jsonpath_expr_str = '$[?(@.id == "a")].foo'

    get_jsonpath(json_file, json_str, jsonpath_expr_str)
cannin commented 4 years ago

This JSONPath expression retrieves what I'd like:

jsonpath_expr_str = "$[?(@.evidence[*].source_api == 'reach')]"
PritiShaw commented 4 years ago

This JSONPath expression retrieves what I'd like:

jsonpath_expr_str = "$[?(@.evidence[*].source_api == 'reach')]"

Hi Mentor I have received reply from Ben regarding our query (https://github.com/sorgerlab/indra/issues/1141) He said about method indra.tools.assemble_corpus.filter_evidence_source(stmts_in, source_apis, policy='one', **kwargs) image

This is also implemented in the INDRA REST API ,documented at http://api.indra.bio:8000/, under the "Preassembly" heading. image