geneontology / go-fastapi

https://api.geneontology.org/
5 stars 3 forks source link

GO API giving unexpected results after refactored code switchover #77

Closed kltm closed 1 year ago

kltm commented 1 year ago

The original GO Helpdesk query (slightly edited to remove identifying information) below.


[...] I am writing regarding the Gene Ontology API providing unpredictable and confusing responses when using the link /api/bioentity/function/{id}/genes/, as is explained in the GO API documentation (https://api.geneontology.org/).

For example, consider the term GO:0046330 (positive regulation of JNK cascade). When browsing in Amigo 2 browser and applying the Homo Sapiens filter among the provided organisms, 101 annotations exist (between the aforementioned GO Term and the associated genes).

The constructed query link for this example would be: https://api.geneontology.org/api/bioentity/function/GO:0046330/genes . Just five days, the above link returned a very long response json, which was correctly parsed in code to find all annotations. However, in recent days, querying the above example link resulted in returning no annotations for the Homo Sapiens taxon (NCBITaxon:9606). On the other hand, if I supplemented the link with additional request parameters, such as setting the taxon to Homo Sapiens (to construct the following input link: https://api.geneontology.org/api/bioentity/function/GO:0046330/genes?taxon=NCBITaxon:9606), the response actually returned some of the Homo Sapiens annotation that should have been present in the initial link, when we query all annotated genes without the specified taxon.

[We] are writing a tool that is based on these queries. Therefore, we would greatly appreciate your input in this confusion. [...]

kltm commented 1 year ago

Looking at this API endpoint, I would expect similar the results of

https://api.geneontology.org/api/bioentity/function/GO:0046330/genes?rows=500

to be a superset of

https://amigo.geneontology.org/amigo/search/bioentity?q=*:*&fq=isa_partof_closure:%22GO:0046330%22&sfq=document_category:%22bioentity%22

However, no matter the parameters I send, I seem to get only 10 results. It seems like there may be some quirks to the parameter parsing here in the refactored code?

Tagging @sierra-moxon

kltm commented 1 year ago

As this is a "new" issue that we need to deal with independently of the overall refactor, I'm going to bump this over to the proactive software / bugs project.

sierra-moxon commented 1 year ago

It looks like when I set the default number of rows in the new API to 100, it caused this code: https://github.com/biolink/ontobio/blob/f86d367fa5c4c85ea6ce8743166ac072a6d66115/ontobio/golr/golr_query.py#L1372 to never be executed, and thus, this default was used.

sierra-moxon commented 1 year ago

deployed the fix to "PROD B" - https://api-b.geneontology.org/api/bioentity/function/GO%3A0046330/genes?taxon=NCBITaxon%3A9606&relationship_type=involved_in&start=0 (now returns 101 associations which is equivalent to amigo as stated above. I also see a much larger number of associations with no human taxon filter, and I do see NCBITaxon:9606 in the non-taxon-filtered results now.

kltm commented 1 year ago

Awesome--thank you for the quick fix! I'm going to let this burn in for a day or so and then switchover the AmiGO widget. Tagging @pgaudet