NCATSTranslator / testing

Materials and tools for testing Translator components
1 stars 9 forks source link

What proteins are in the ERAD BiologicalProcess/Pathway? - Two Actors Respond #38

Open sstemann opened 3 years ago

sstemann commented 3 years ago

Query: [erad] (https://github.com/NCATSTranslator/testing/blob/main/ars-requests/not-none/erad.json) PK: 8c401fb4-922f-4c49-adfe-436314838c74 GO: 0036503 Results Tracking Sheet

image

Responses From:

Note: We also tried it with flipped Object/Subject (ARAGORN returned 8 copies of "protein". We also tried it with Biolink category:pathway and got the same results set (e6b2a4a6-1501-4120-9b7c-16f25657c454)

cbizon commented 3 years ago

ranking-agent lumps genes and proteins together under Gene. So if n1.category = "biolink:Gene" then we return results. Also if n1.category = "biolink:GeneOrGeneProduct". As long as ARAs can handle biolink subclassing, using that union might lead to results from more of them, since it should work whether ARAs use genes or proteins..

sstemann commented 3 years ago

ranking-agent lumps genes and proteins together under Gene. So if n1.category = "biolink:Gene" then we return results. Also if n1.category = "biolink:GeneOrGeneProduct". As long as ARAs can handle biolink subclassing, using that union might lead to results from more of them, since it should work whether ARAs use genes or proteins..

@cbizon still two responses: a0fe5554-4c2e-472b-9cf6-6f0cc5556de2 but this time Aragorn and CAM, so not sure if the goal of subclassing is achieved? i tried it with a node array, and it got the original two and the Aragorn and CAM - 15bf580c-a53b-477e-9d2b-bb25c35c2cd6 - so I'm not sure if this is an ARS requirement to generalize with arrays or if this should be handled somewhere else?

brettasmi commented 3 years ago

5c3d4b40-8da2-42db-ab9c-bc572d5d63e4:

{
  "message": {
    "query_graph": {
      "edges": {
        "e01": {
          "object": "n0",
          "subject": "n1"
        }
      },
      "nodes": {
        "n0": {
          "category": "biolink:BiologicalProcess",
          "id": "GO:0036503"
        },
        "n1": {
          "category": "biolink:Gene"
        }
      }
    }
  }
}

and c0ec4c9a-f109-4ca0-87ba-22ca195dfcfc:

{
  "message": {
    "query_graph": {
      "edges": {
        "e01": {
          "object": "n0",
          "subject": "n1"
        },
        "e02": {
          "object": "n1",
          "subject": "n2"
        }
      },
      "nodes": {
        "n0": {
          "category": "biolink:BiologicalProcess",
          "id": "GO:0036503"
        },
        "n1": {
          "category": "biolink:Gene"
        },
        "n2": {
          "category": "biolink:Protein"
        }
      }
    }
  }
}

Should work for us, but for whatever reason, I'm having trouble seeing the results from the ARS right now.

southalln commented 3 years ago

I'd like to come back to the point "ranking-agent lumps genes and proteins together under Gene". In the original FOA we had written something about expectations for the ARAs to the effect "the autonomous relay agents should adroitly handle the integration of knowledge from multiple Knowledge Providers and thus multiple different domains of biomedical knowledge." I think it is reasonable for the user to pose the question requesting proteins, and then the expectation would be for the ARA to know that KPs including whatever resources Ranking agent calls out to require a gene identifier in lieu of a protein identifier.

sstemann commented 3 years ago

@brettasmi @cbizon if the TRAPI query is written such that:

"n1": { "category": [ "biolink:GeneOrGeneProduct", "biolink:Protein", "biolink:Gene" ] }

then yes, many ARAs return responses (8b63edcf-3c04-480a-889c-6d04029354ac)

cbizon commented 3 years ago

@sstemann that's interesting. You shouldn't have to include gene and protein as well. I would consider not responding to the superclass biolink:GeneOrGeneProduct a bug in the ARA.

@southalln: yeah. Maybe another way to say what you are saying is: if ranking agent is going to mush gene and protein information together, then it should also respond to queries in the same way (consider gene and protein queries as queries for the same thing). I am hoping that the discussion of formal conflation that Mike Bada instigated and is simmering in the data modeling meetings will provide a more complete solution to this issue.

sstemann commented 3 years ago

@cbizon i like the sounds of that but the results appear to be different and without biolink:Gene (93aa46eb-6613-465e-85ff-4e635ccbb6d8) we lost results from BTE and Improve:

image

MarkDWilliams commented 3 years ago

Yeah, I'm not sure how consistently the biolink superclasses (GeneOrGeneProduct, DiseaseOrPhenotypicFeature) are implemented across the ARAs. Sounds like something that's worth touching on briefly on today's standup.

cbizon commented 3 years ago

Yep. Tests for this are part of the /testing/onehops framework for what it's worth.

balhoff commented 3 years ago

@Shalsh23 asked me about this query for CAM-KP. We have two issues with it working correctly: (1) apparently we need GeneOrGeneProduct instead of Protein, and (2) we have a performance issue with unbound predicates; it works with related_to (this is our bug).

This query works with CAM-KP:

{
  "message": {
    "query_graph": {
      "edges": {
        "e01": {
          "subject": "n1",
          "object": "n0",
"predicate": "biolink:related_to"
        }
      },
      "nodes": {
        "n0": {
          "id": "GO:0036503",
          "category":"biolink:BiologicalProcess"
                },
        "n1": {
          "category": "biolink:GeneOrGeneProduct"
        }
      }
    }
  }
 }
colleenXu commented 3 years ago

I ran the updated query below through the ARS, PK: c32251cb-c6ed-4ea5-b558-3916c84d34cc

Screen Shot 2021-10-12 at 8 39 18 PM

@andrewsu Exploring Agent can be untagged from this issue

The updated query could be something like this (it's missing the workflow section but the rest is TRAPI v1.2):

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["GO:0036503"],
            "categories":["biolink:BiologicalProcess"]
                },
                "n1": {
                    "categories": ["biolink:Protein"]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}