Closed colleenXu closed 2 years ago
In other words, the logic of having unique edges by subject-predicate-object-api-source needs to be expanded.
perhaps we can use specific edge-attributes: when they exist and differ between records, turn these records into separate edges....I suggest the biolink:has_disease_context
one above for this particular API...
Just to solidify my understanding, are you saying that we simply want to add specific edge attributes to the generation of unique edge IDs/comparison of edges, similarly to this issue?
If this is the case, we could do something along the lines of one of the following:
Does one of these cover your expected behavior?
I agree that there is an issue here -- only having one of the three records represented is not quite right. But I think the desired behavior is not clear enough to say it's ready for implementation. I'm not convinced that there three records should say as three separate edges. I think we need to raise this with the Architecture call to see if there is a best practice for how to handle this...
It sounds like this issue may need further discussion with Multiomics team (Guangrong) and maybe the rest of Translator, especially after I reviewed my understanding of what this API's data is...
At the moment, I think "edge merging and overwriting the edge-attributes" is happening for...
[EDIT] Adding that for each analysis (regardless of it being the same disease context or not), there's a different t-test value and effect-size value...and how to preserve / structure those values is something to consider as well...
My thoughts are that records for point 1 should be completely two separate edges, and records for point 2 should be just 1 edge, but maybe with multiple values to explain the "replicates" thing....
(and a technical note: I think the parser for this API or BTE's coding can handle whatever choices we make, and the data doesn't necessarily need to change)
This issue may need to be closed, to open more specific issues around edge-merging:
@andrewsu said we want to collect the list of edge-attribute stuff that are related to "context" (what level of naming - the name of the field in x-bte-response-mapping? or whatever its called inside a record after api-response-transform?). These can then be used when they are available for hashing.
"biological replicates" may need to be handled outside of BTE / the API parsers, by the data providers themselves. It's not clear how BTE would know how to merge the "edge-attributes" when everything is the same except some specific edge-attributes (t-test and pvalue, in this dataset's case)
Note: I'd have to review the raw data for drug response kp api, to see if the "biological replicates" discussion is still valid...I'm not seeing effect-size edge-attribute or the "duplicates" that I previously discussed in the recently updated biothings api...
Decisions from lab meeting:
Example of record from multiomics drug response kp api (biothings) api with the disease-context:
Note for possible implementation:
Update after talking with Guangrong:
We should therefore address this like we have for disease context: by making sure the edges are kept unique!!!
Also after talking to Guangrong: they want to see their KP being called. Making a PR to add this API to BTE...
note: BTE doesn't ingest this api directly, but it can be queried using http://localhost:3000/v1/smartapi/adf20dd6ff23dfe18e8e012bde686e31/query
EDIT: updated link. There are 3 records here, that differ in their edge-attributes (particularly the "biolink:has_disease_context" one). We would want them in the TRAPI response as 3 separate KG edges. However, when running a query thru BTE, only 1 record (the last one in the biothings output) is there as an edge.
TRAPI query from gene -> chem
``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["ENSEMBL:ENSG00000181991"], "categories":["biolink:Gene"] }, "n1": { "ids":["PUBCHEM.COMPOUND:71271629"], "categories":["biolink:SmallMolecule"] } }, "edges": { "e1": { "subject": "n0", "object": "n1", "predicates": ["biolink:associated_with_sensitivity_to"] } } } } } ```the only KG edge found, matches the last entry in the biothings query
``` "95d03331aa6e6a6cf6a28bedf137b113": { "predicate": "biolink:associated_with_sensitivity_to", "subject": "NCBIGene:64963", "object": "PUBCHEM.COMPOUND:71271629", "attributes": [ { "attribute_type_id": "biolink:aggregator_knowledge_source", "value": [ "infores:biothings-explorer" ], "value_type_id": "biolink:InformationResource" }, { "attribute_source": "infores:biothings-multiomics-biggim-drugresponse", "attribute_type_id": "biolink:GeneToDrugAssociation", "description": "Sensitivity to the drug is associated with expression of the gene", "value": "biolink:GeneHasExpressionThatContributesToDrugSensitivityAssociation", "value_type_id": "biolink:id" }, { "attribute_source": "infores:biothings-multiomics-biggim-drugresponse", "attribute_type_id": "BAO:0002162", "description": "Method used to quantify the strength of the association is AUC", "value": "BAO:0002120", "value_type_id": "biolink:id" }, { "attribute_source": "infores:biothings-multiomics-biggim-drugresponse", "attribute_type_id": "EDAM:data_0951", "attributes": { "attribute_source": "infores:biothings-multiomics-biggim-drugresponse", "attribute_type_id": "NCIT:C53236", "description": "Spearman Correlation Test was used to compute the p-value for the association", "value": "NCIT:C53249", "value_type_id": "biolink:id" }, "description": "Confidence metric for the association", "value": 0.007547007781067878, "value_type_id": "EDAM:data_1669" }, { "attribute_source": "infores:biothings-multiomics-biggim-drugresponse", "attribute_type_id": "GECKO:0000106", "description": "Sample size used to compute the correlation", "value": 10 }, { "attribute_source": "infores:biothings-multiomics-biggim-drugresponse", "attribute_type_id": "biolink:has_disease_context", "description": "Disease context for the gene-drug sensitivity association", "value": "MONDO:0020311", "value_type_id": "biolink:id" }, { "attribute_source": "infores:biothings-multiomics-biggim-drugresponse", "attribute_type_id": "biolink:Dataset", "attributes": { "attribute_source": "infores:biothings-multiomics-biggim-drugresponse", "attribute_type_id": "biolink:Publication", "description": "Publication describing the dataset used to compute the association", "value": "PMID:27397505", "value_type_id": "biolink:id" }, "description": "Dataset used to compute the association", "value": "GDSC", "value_type_id": null } ] } ```note: there is a related issue regarding "missing records". However, it doesn't apply for the example above (<1000 records returned in the biothings response).
Sometimes the records will be "missing" because of the 1000 record limit of what is returned from the biothings api queries