CLARIAH / grlc

grlc builds Web APIs using shared SPARQL queries
http://grlc.io
MIT License
135 stars 33 forks source link

fixing response headers of requests #367

Open rapw3k opened 2 years ago

rapw3k commented 2 years ago

Hi, I have a few questions regarding headers of requests. I am testing with this endpoint: http://grlc.io/api-git/rapw3k/cybele/#/json/get_allDatasets

however, the page=1 gives the same result as without page number, so shouldn't be next page = 2 if i dont send page number ?

Moreover, the links are actually incorrect, they have two times grlc.io in domain, i.e., http://grlc.io,grlc.io/ , and also why 10.0 instead of just 10 ?

thanks! Raul

c-martinez commented 2 years ago

Hi @rapw3k,

Thanks for your comments! I don't think this particular functionality has been extensively used, so it is not very polished: there is definitely room for improvement :-)

shouldn't be next page = 2 if i dont send page number ? Yes, I you are right. Probably the "page" variable should be set to 1 if not present in the request.

links are actually incorrect, they have two times grlc.io in domain, i.e., http://grlc.io,grlc.io/ , and also why 10.0 instead of just 10 Not sure why links are being generated like that, but indeed they look in correct.

why the response headers says the last page is page 10 page 10 does not give link to next page what will happen if i have more than 100 (x10 pages) =1000 results

These are all related -- the issue comes from the fact that, because counting results before executing the query is expensive, at the moment grlc just 'guesses' (or "Provides a dummy count for now") there will be 1000 results (ugly hack):

https://github.com/CLARIAH/grlc/blob/d4ddb1530cfef57464a8dc31edddc44c1387fe77/src/gquery.py#L68-L73

Until now, we didn't have a good use case to justify the additional load of querying to pre-calculate the number of results. But if this is functionality that would be useful to you, maybe we've finally got a reason to implement this properly.

Is the paging functionality something you would need for your use case? Are there any particular considerations you think should be taken into account?

@albertmeronyo -- what do you think? Do you know if there are other use cases which would benefit from this functionality?

rapw3k commented 2 years ago

thanks for the reply @c-martinez Indeed, we are having some cases where we are returning tens of results, and the paging becomes quite relevant in order to get them.

Some problematic points I see, apart from the ones mentioned above:

https://grlc.io/api-git/cybele-project/metadata/allDatasets_testbed?testbed=https://w3id.org/cybele/datasets/PSNC&page=70

https://grlc.io/api-git/cybele-project/metadata/allDatasets_testbed?testbed=https://w3id.org/cybele/datasets/PSNC&page=72