Open sstemann opened 3 years ago
ranking-agent lumps genes and proteins together under Gene. So if n1.category = "biolink:Gene" then we return results. Also if n1.category = "biolink:GeneOrGeneProduct". As long as ARAs can handle biolink subclassing, using that union might lead to results from more of them, since it should work whether ARAs use genes or proteins..
ranking-agent lumps genes and proteins together under Gene. So if n1.category = "biolink:Gene" then we return results. Also if n1.category = "biolink:GeneOrGeneProduct". As long as ARAs can handle biolink subclassing, using that union might lead to results from more of them, since it should work whether ARAs use genes or proteins..
@cbizon still two responses: a0fe5554-4c2e-472b-9cf6-6f0cc5556de2 but this time Aragorn and CAM, so not sure if the goal of subclassing is achieved? i tried it with a node array, and it got the original two and the Aragorn and CAM - 15bf580c-a53b-477e-9d2b-bb25c35c2cd6 - so I'm not sure if this is an ARS requirement to generalize with arrays or if this should be handled somewhere else?
5c3d4b40-8da2-42db-ab9c-bc572d5d63e4:
{
"message": {
"query_graph": {
"edges": {
"e01": {
"object": "n0",
"subject": "n1"
}
},
"nodes": {
"n0": {
"category": "biolink:BiologicalProcess",
"id": "GO:0036503"
},
"n1": {
"category": "biolink:Gene"
}
}
}
}
}
and c0ec4c9a-f109-4ca0-87ba-22ca195dfcfc:
{
"message": {
"query_graph": {
"edges": {
"e01": {
"object": "n0",
"subject": "n1"
},
"e02": {
"object": "n1",
"subject": "n2"
}
},
"nodes": {
"n0": {
"category": "biolink:BiologicalProcess",
"id": "GO:0036503"
},
"n1": {
"category": "biolink:Gene"
},
"n2": {
"category": "biolink:Protein"
}
}
}
}
}
Should work for us, but for whatever reason, I'm having trouble seeing the results from the ARS right now.
I'd like to come back to the point "ranking-agent lumps genes and proteins together under Gene". In the original FOA we had written something about expectations for the ARAs to the effect "the autonomous relay agents should adroitly handle the integration of knowledge from multiple Knowledge Providers and thus multiple different domains of biomedical knowledge." I think it is reasonable for the user to pose the question requesting proteins, and then the expectation would be for the ARA to know that KPs including whatever resources Ranking agent calls out to require a gene identifier in lieu of a protein identifier.
@brettasmi @cbizon if the TRAPI query is written such that:
"n1": { "category": [ "biolink:GeneOrGeneProduct", "biolink:Protein", "biolink:Gene" ] }
then yes, many ARAs return responses (8b63edcf-3c04-480a-889c-6d04029354ac)
@sstemann that's interesting. You shouldn't have to include gene and protein as well. I would consider not responding to the superclass biolink:GeneOrGeneProduct a bug in the ARA.
@southalln: yeah. Maybe another way to say what you are saying is: if ranking agent is going to mush gene and protein information together, then it should also respond to queries in the same way (consider gene and protein queries as queries for the same thing). I am hoping that the discussion of formal conflation that Mike Bada instigated and is simmering in the data modeling meetings will provide a more complete solution to this issue.
@cbizon i like the sounds of that but the results appear to be different and without biolink:Gene (93aa46eb-6613-465e-85ff-4e635ccbb6d8) we lost results from BTE and Improve:
Yeah, I'm not sure how consistently the biolink superclasses (GeneOrGeneProduct, DiseaseOrPhenotypicFeature) are implemented across the ARAs. Sounds like something that's worth touching on briefly on today's standup.
Yep. Tests for this are part of the /testing/onehops framework for what it's worth.
@Shalsh23 asked me about this query for CAM-KP. We have two issues with it working correctly: (1) apparently we need GeneOrGeneProduct instead of Protein, and (2) we have a performance issue with unbound predicates; it works with related_to
(this is our bug).
This query works with CAM-KP:
{
"message": {
"query_graph": {
"edges": {
"e01": {
"subject": "n1",
"object": "n0",
"predicate": "biolink:related_to"
}
},
"nodes": {
"n0": {
"id": "GO:0036503",
"category":"biolink:BiologicalProcess"
},
"n1": {
"category": "biolink:GeneOrGeneProduct"
}
}
}
}
}
I ran the updated query below through the ARS, PK: c32251cb-c6ed-4ea5-b558-3916c84d34cc
@andrewsu Exploring Agent can be untagged from this issue
The updated query could be something like this (it's missing the workflow section but the rest is TRAPI v1.2):
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids": ["GO:0036503"],
"categories":["biolink:BiologicalProcess"]
},
"n1": {
"categories": ["biolink:Protein"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
Query: [erad] (https://github.com/NCATSTranslator/testing/blob/main/ars-requests/not-none/erad.json) PK: 8c401fb4-922f-4c49-adfe-436314838c74 GO: 0036503 Results Tracking Sheet
Responses From:
Note: We also tried it with flipped Object/Subject (ARAGORN returned 8 copies of "protein". We also tried it with Biolink category:pathway and got the same results set (e6b2a4a6-1501-4120-9b7c-16f25657c454)