ebi-uniprot / QuickGOBE

Repository containing the back-end logic for QuickGO
Apache License 2.0
0 stars 1 forks source link

REST API limit and page not considered #225

Closed deeenes closed 5 years ago

deeenes commented 5 years ago

Dear UniProt People,

I have the impression in the QuickGO REST API annotation/downloadSearch query limit and page parameters don't work. The resulted files always contain 100k data lines no matter the limit value and their content is identical any page I request. A simple example to show this:

curl -H 'Accept:text/tsv' -v -L -o quickgo_example_p1.tsv 'https://www.ebi.ac.uk/QuickGO/services/annotation/downloadSearch?page=1'

curl -H 'Accept:text/tsv' -v -L -o quickgo_example_p2.tsv 'https://www.ebi.ac.uk/QuickGO/services/annotation/downloadSearch?page=2'

Please tell me if I do anything wrong.

Best wishes,

Denes

eddturner commented 5 years ago

Dear Denes,

Thank you for getting in touch! You are right about the examples, thank you for spotting. We need to update our API documentation (https://www.ebi.ac.uk/QuickGO/api/index.html#!/annotations/downloadLookupUsingGET) to highlight the fact that the page parameter is not used in the downloadSearch end-point -- and this is the reason your two queries return the same results. We will create an internal ticket for this and put this in the backlog.

If you have a specific task you are trying to achieve, don't hesitate to get in touch.

Kind regards, The UniProt QuickGO team.

deeenes commented 5 years ago

Dear Edd,

Thank you for your reply.

If you drop these parameters and there is a global 10k limit on any query results then it is not possible at all, for example, to get all GO annotations for all human proteins. Implementing pagination on client side (by querying batches of terms or IDs over several queries) would take many hours as QuickGO is apparently very slow to serve queries which are not already cached server side.

Then users can download files from the FTP or go to AmiGO Solr which apparently hasn't been updated for half a year. I am not aware of other alternative.

Best wishes,

Denes

eddturner commented 5 years ago

Hi again Denes,

The QuickGO REST API is currently aimed at providing a suite of filtering capabilities that can be applied to the entire data-set (e.g., annotations) -- with an emphasis on filtering -- to restrict the data-set to something smaller (<= 100K) and if necessary allowing its download. Indeed, it is not currently designed for downloading huge result sets, e.g., all annotations to human proteins. In this case, the recommended approach is to use FTP and post-process the files.

We do however have tickets on our internal backlog that seek to address the issue you raise (i.e., provide download of large/entire result sets, without killing our servers! :) ) and work will be starting on this when possible.

Kind regards,

Edd

deeenes commented 5 years ago

Dear Edd,

Many thanks for the explanation.

Best,

Denes