RobokopU24 / Feedback

Feedback on the ROBOKOP project
https://robokop.renci.org
0 stars 0 forks source link

Query Failure #108

Closed karafecho closed 1 year ago

karafecho commented 1 year ago

This issue is to report a ROBOKOP query that I ran multiple times but that never returned answers (even "no answers are available"). Note that the image below was captured after the query was submitted to ROBOKOP and after ROBOKOP attempted to fetch answers. Note also that Nyssa was able to run this query through ExEmplar, even with more specific predicates.

PUBCHEM.COMPOUND:6618 = Tetrabromobisphenol A - related_to - NamedThing - related_to - Gene - related_to - MONDO:0004995 = cardiovascular disease

image

karafecho commented 1 year ago

Additional information: Nyssa ran the query that produced the results below using ExEmplar.

The query was structured as: Flame retardant (ChemicalEntity) - Gene - NamedThing (wildcard) - CVD endpoint (DiseaseOrPhenotypicFeature). The predicates shown in the figure are derived from the first edge.

I tried to run the same query the very next day using ROBOKOP but was unsuccessful. Note that I tried several different identifiers for n0 and n3, as well as various predicates, but nothing worked. Just spinning to fetch answers and then stopping, with no error and no indication that there simply were no available results.

image

cbizon commented 1 year ago

Can Nyssa/Jon-Michael tell us exactly what cypher was being run by exemplar?

karafecho commented 1 year ago

From Nyssa:

MATCH (n0_0:biolink:ChemicalEntity)-[r0_0]-(n1_0:biolink:GeneOrGeneProduct)-[r1_0]-(n2_0)-[r2_0]-(n3_0:biolink:DiseaseOrPhenotypicFeature), (n0_1:biolink:ChemicalEntity)-[r0_1]-(n1_1)-[r1_1]-(n2_1:biolink:GeneOrGeneProduct)-[r2_1]-(n3_1:biolink:DiseaseOrPhenotypicFeature) WHERE any(x IN ['Tetrabromobisphenol A', '1,2,5,6,9,10-Hexabromocyclododecane', 'Triethyl phosphate'] WHERE x IN n0_0.name OR x IN n0_0.equivalent_identifiers) AND any(x IN ['MONDO:0005267', 'UMLS:C0876994', 'cardiovascular disorder'] WHERE x IN n3_0.name OR x IN n3_0.equivalent_identifiers) AND any(x IN ['Tetrabromobisphenol A', '1,2,5,6,9,10-Hexabromocyclododecane', 'Triethyl phosphate'] WHERE x IN n0_1.name OR x IN n0_1.equivalent_identifiers) AND any(x IN ['MONDO:0005267', 'UMLS:C0876994', 'cardiovascular disorder'] WHERE x IN n3_1.name OR x IN n3_1.equivalent_identifiers) RETURN * LIMIT 100

image

karafecho commented 1 year ago

I reproduced the above issue by way of a separate query, shown below. Importantly, I ran this query a few weeks ago and was able to generate results.

image

cbizon commented 1 year ago

Query 1 is very big. If I set the NamedThing to Gene, then there are 18000 results in ROBOKOP.

With NamedThing it is worse for multiple reasons, including some subclass-of edges that we probably should do something about. Exemplar limits to 100 results (arbitrarily) so that is not a good comparison.

cbizon commented 1 year ago

I'm not sure why query 2 is dragging, seems like it should be good...

cbizon commented 1 year ago

This one actually does return

image

It's maybe not as quick as I'd like, but it is returning lots of results. There do seem to be duplicates that I would hope would be filtered out...

karafecho commented 1 year ago

I didn't make the connection with the limit that Nyssa (or ExEmplar by default) placed on the number of returned results. Thanks for pointing that out. However, if ROBOKOP can place a default limit on the number of results it returns, based on something smart like scores, and then notifies the user that they can ask for additional results or wait for them, kind of like the initial version of ROBOKOP did and the Translator UI currently does, then that might be helpful for users. The email notification that the initial version of ROBOKOP did of "we will send you an email when your results are ready" would also work, but that would require a login option, I think.

I believe there are two other related issues here.

The first one is strictly reliable performance. For instance, the query above ran for me the first time I ran it, but it then stopped running the second time I attempted to run it.

The second issue is related to the first issue. Presumably, ROBOKOP timed out the second time I ran the query, but it did not give the user any indication as to why the query suddenly stopped running, e.g., no error message was returned, no "come back later" message was returned, nothing.

cbizon commented 1 year ago

OK, since this one spun out multiple issues, I am going to close it. The child issues point back here to seed the discussion.