biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

Handling provenance situation C #314

Closed colleenXu closed 2 years ago

colleenXu commented 2 years ago

I'll add details as they emerge.

We plan to update some APIs so that their x-bte-response-mapping has something like this: edge-attributes: association.edge_attributes

In this situation where the key is "edge-attributes", we want to ingest everything under it (it should be an array of objects), PRSERVE it, and put it as the edge's attributes. Then BTE should continue to add its own source provenance edge attribute as well.

It is similar to the behavior of provenance situation A, which was previously addressed by Eric Zhou.

colleenXu commented 2 years ago

The first API that we want to do this with is text-mining targeted association api: 978fe380a147a8641caf72320862697b . I'll update this post when I get the updated Smartapi yaml for this api out (that has the edge-attributes)

colleenXu commented 2 years ago

@marcodarko The SmartAPI yaml for text-mining targeted association api (978fe380a147a8641caf72320862697b) is updated. The response mapping is here

I suggest testing by querying this API directly and thru BTE: https://pending.biothings.io/text_mining_targeted_association and http://localhost:3000/v1/smartapi/978fe380a147a8641caf72320862697b/query

Example

Right now, BTE is treating the response-mapping with "edge-attributes" as just another single edge attribute.

Example inside ``` "0d7332edf2a257ce27510db68a53c41e": { "predicate": "biolink:treats", "subject": "PUBCHEM.COMPOUND:9290", "object": "MONDO:0002177", "attributes": [ { "attribute_type_id": "biolink:aggregator_knowledge_source", "value": [ "infores:translator-biothings-explorer" ], "value_type_id": "biolink:InformationResource" }, { "attribute_type_id": "biolink:supporting_data_source", "value": [], "value_type_id": "biolink:InformationResource" }, { "attribute_type_id": "biolink:primary_knowledge_source", "value": [], "value_type_id": "biolink:InformationResource" }, { "attribute_type_id": "publications", "value": [] }, { "attribute_type_id": "edge-attributes", "value": [ { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:original_knowledge_source", "description": "The Text Mining Provider Targeted Biolink Association KP from NCATS Translator provides text-mined assertions from the biomedical literature.", "value": "infores:text-mining-provider-targeted", "value_type_id": "biolink:InformationResource" }, { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:supporting_data_source", "value": "infores:pubmed", "value_type_id": "biolink:InformationResource" }, { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:has_evidence_count", "description": "The count of the number of sentences that assert this edge", "value": "1", "value_type_id": "biolink:EvidenceCount" }, { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:tmkp_confidence_score", "description": "An aggregate confidence score that combines evidence from all sentences that support the edge", "value": "0.9996996", "value_type_id": "biolink:ConfidenceLevel" }, { "attribute_source": "infores:pubmed", "attribute_type_id": "biolink:supporting_document", "description": "The document(s) that contains the sentence(s) that assert the Biolink association represented by the edge; pipe-delimited", "value": "PMID:20814559", "value_type_id": "biolink:Publication" }, { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:supporting_study_result", "attributes": [ { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:supporting_text", "description": "A sentence asserting the Biolink association represented by the parent edge", "value": "Recommended therapy includes prevention of further absorption of the drug, inotropic therapy, calcium gluconate, and hyperinsulinemia/euglycemia therapy.", "value_type_id": "EDAM:data_3671" }, { "attribute_source": "infores:pubmed", "attribute_type_id": "biolink:supporting_document", "description": "The document that contains the sentence that asserts the Biolink association represented by the parent edge", "value": "PMID:20814559", "value_type_id": "biolink:Publication", "value_url": "https://pubmed.ncbi.nlm.nih.gov/20814559/" }, { "attribute_source": "infores:pubmed", "attribute_type_id": "biolink:supporting_document_type", "description": "The publication type(s) for the document in which the sentence appears, as defined by PubMed; pipe-delimited", "value": "", "value_type_id": "MESH:U000020" }, { "attribute_source": "infores:pubmed", "attribute_type_id": "biolink:supporting_document_year", "description": "The year the document in which the sentence appears was published", "value": "2155", "value_type_id": "UO:0000036" }, { "attribute_source": "infores:pubmed", "attribute_type_id": "biolink:supporting_text_located_in", "description": "The part of the document where the sentence is located, e.g. title, abstract, introduction, conclusion, etc.", "value": "abstract", "value_type_id": "IAO_0000314" }, { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:extraction_confidence_score", "description": "The score provided by the underlying algorithm that asserted this sentence to represent the assertion specified by the parent edge", "value": "0.9996996", "value_type_id": "EDAM:data_1772" }, { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:subject_location_in_text", "description": "The start and end character offsets relative to the sentence for the subject of the assertion represented by the parent edge; start and end offsets are pipe-delimited, discontinuous spans are delimited using commas", "value": "94|111", "value_type_id": "SIO:001056" }, { "attribute_source": "infores:text-mining-provider-targeted ", "attribute_type_id": "biolink:object_location_in_text", "description": "The start and end character offsets relative to the sentence for the object of the assertion represented by the parent edge; start and end offsets are pipe-delimited, discontinuous spans are delimited using commas", "value": "117|133", "value_type_id": "SIO:001056" } ], "description": "a single result from running NLP tool over a piece of text", "value": "tmkp:319fa8c7a361ccdddc0a5dcf2c0948205ecf4507dc41509a2ae6005cd236d062", "value_type_id": "biolink:TextMiningResult" } ] } ] }, ```

Instead, BTE should replace the edge attributes array with the array that comes from the "edge-attributes" response-mapping field...

Then, BTE has to keep the behavior of adding its own provenance edge-attribute to that array...(the first object of the example above).

This is an example of what would be expected ``` "0d7332edf2a257ce27510db68a53c41e": { "predicate": "biolink:treats", "subject": "PUBCHEM.COMPOUND:9290", "object": "MONDO:0002177", "attributes": [ { ## this is added by BTE "attribute_type_id": "biolink:aggregator_knowledge_source", "value": [ "infores:translator-biothings-explorer" ], "value_type_id": "biolink:InformationResource" }, ## all the attributes below are set up by the edge-attributes { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:original_knowledge_source", "description": "The Text Mining Provider Targeted Biolink Association KP from NCATS Translator provides text-mined assertions from the biomedical literature.", "value": "infores:text-mining-provider-targeted", "value_type_id": "biolink:InformationResource" }, { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:supporting_data_source", "value": "infores:pubmed", "value_type_id": "biolink:InformationResource" }, { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:has_evidence_count", "description": "The count of the number of sentences that assert this edge", "value": "1", "value_type_id": "biolink:EvidenceCount" }, { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:tmkp_confidence_score", "description": "An aggregate confidence score that combines evidence from all sentences that support the edge", "value": "0.9996996", "value_type_id": "biolink:ConfidenceLevel" }, { "attribute_source": "infores:pubmed", "attribute_type_id": "biolink:supporting_document", "description": "The document(s) that contains the sentence(s) that assert the Biolink association represented by the edge; pipe-delimited", "value": "PMID:20814559", "value_type_id": "biolink:Publication" }, { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:supporting_study_result", "attributes": [ { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:supporting_text", "description": "A sentence asserting the Biolink association represented by the parent edge", "value": "Recommended therapy includes prevention of further absorption of the drug, inotropic therapy, calcium gluconate, and hyperinsulinemia/euglycemia therapy.", "value_type_id": "EDAM:data_3671" }, { "attribute_source": "infores:pubmed", "attribute_type_id": "biolink:supporting_document", "description": "The document that contains the sentence that asserts the Biolink association represented by the parent edge", "value": "PMID:20814559", "value_type_id": "biolink:Publication", "value_url": "https://pubmed.ncbi.nlm.nih.gov/20814559/" }, { "attribute_source": "infores:pubmed", "attribute_type_id": "biolink:supporting_document_type", "description": "The publication type(s) for the document in which the sentence appears, as defined by PubMed; pipe-delimited", "value": "", "value_type_id": "MESH:U000020" }, { "attribute_source": "infores:pubmed", "attribute_type_id": "biolink:supporting_document_year", "description": "The year the document in which the sentence appears was published", "value": "2155", "value_type_id": "UO:0000036" }, { "attribute_source": "infores:pubmed", "attribute_type_id": "biolink:supporting_text_located_in", "description": "The part of the document where the sentence is located, e.g. title, abstract, introduction, conclusion, etc.", "value": "abstract", "value_type_id": "IAO_0000314" }, { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:extraction_confidence_score", "description": "The score provided by the underlying algorithm that asserted this sentence to represent the assertion specified by the parent edge", "value": "0.9996996", "value_type_id": "EDAM:data_1772" }, { "attribute_source": "infores:text-mining-provider-targeted", "attribute_type_id": "biolink:subject_location_in_text", "description": "The start and end character offsets relative to the sentence for the subject of the assertion represented by the parent edge; start and end offsets are pipe-delimited, discontinuous spans are delimited using commas", "value": "94|111", "value_type_id": "SIO:001056" }, { "attribute_source": "infores:text-mining-provider-targeted ", "attribute_type_id": "biolink:object_location_in_text", "description": "The start and end character offsets relative to the sentence for the object of the assertion represented by the parent edge; start and end offsets are pipe-delimited, discontinuous spans are delimited using commas", "value": "117|133", "value_type_id": "SIO:001056" } ], ## end of sub-attribute array "description": "a single result from running NLP tool over a piece of text", "value": "tmkp:319fa8c7a361ccdddc0a5dcf2c0948205ecf4507dc41509a2ae6005cd236d062", "value_type_id": "biolink:TextMiningResult" } ## end of attribute that has sub-attributes inside ] ## end of array of edge attributes }, ## end of the edge object ```
colleenXu commented 2 years ago

Update: the linked PRs seem to provide the expected behavior.

Note that the ARAX UI does not correctly display nested attributes: Screen Shot 2021-10-09 at 12 39 46 AM

It also seems that BTE is not providing full-capacity to these endpoints (aka not running ID resolution on output IDs). I'll open another ticket to discuss this...[EDIT: other ticket is here #318]

colleenXu commented 2 years ago

close once it's confirmed that this is on prod...

colleenXu commented 2 years ago

Closing. Can use the queries from here https://github.com/biothings/BioThings_Explorer_TRAPI/issues/241#issuecomment-940375780 and look at the edges to see that we are ingesting their attributes as-intended