Materials-Consortia / OPTIMADE

Specification of a common REST API for access to materials databases
https://optimade.org/specification
Creative Commons Attribution 4.0 International
83 stars 37 forks source link

Should pagination links include all URL query parameters (e.g. response fields)? #391

Closed ml-evs closed 1 year ago

ml-evs commented 2 years ago

This point was raised on the forums (by @oarcelus) here: https://matsci.org/t/cod-response-does-not-include-response-fields-in-next-link/39530.

Currently the pymatgen OptimadeRester (and presumably other clients) rely on the fact that they can just use the links->next URL directly for pagination, but it seems that some databases (for example, COD in the forum post @merkys) do not append URL parameters like response_fields to the pagination links, which currently breaks clients that expect e.g., COD, to return null for cartesian_site_positions.

I do not think the specification currently specifies exactly how this should behave, hence the confusion.

merkys commented 2 years ago

I am almost sure the response_fields should be retained (I think this is the original intention of the specification, even if not explicitly described). Therefore, this is a bug in the COD OPTIMADE implementation, and I will look into fixing it.

I believe the original intention for the pagination URLs is that they have to contain all the parameters required to get the next page of results without any additional parameters. I will also give the specification a look just to be 100% sure.

merkys commented 2 years ago

OK, I gave the specification a look, and it does not seem to be explicit regarding this. Nevertheless, I see only advantages in retaining the original URL parameters: The client does not need to "remember" the request. All its details are included in the URL parameters, thus fetching paginated results is not much more difficult than slurping non-paginated response (bonus: clients are encouraged to paginate).

Moreover, there were discussions (private exchange) about page limits depending on response_fields. In the COD we would like to reduce page limits to 10 when a client requests for cartesian_site_positions, as opposed to page limits of 100 or 1000 when atom positions are not requested. This would play nicely with retained URL parameters, and quite clumsy without them.

The single advantage of placing the onus of URL parameter setting on the client is slight increase of flexibility. By not giving any guarantees about the URL parameters, the specification would push the clients to implement URL string parsing and modification. But clients are already free to do so.

rartino commented 2 years ago

This is what the specification says:

next: represents a link to fetch the next set of results. When the current response is the last page of data, this field MUST be either omitted or null-valued.

I agree that it could be more explicit, but IMO the implicit expectation of "the next set of results" is that they contain the same fields as the previous set - otherwise they are not the same results.

Also important: the specification says nothing about that the URL provided in next needs to be interpretable in terms of OPTIMADE (strictly it doesn't even need to point to the same host...), so it would be a bug for a client to assume that it can understand the next link and re-write it to, e.g., add parameters. Hence, if OPTIMADE does not require the next link to give results with the same fields, then there would exist no way for a client to reliably paginate.

merkys commented 2 years ago

Fixed the COD implementation today. Now the URL parameters should be propagated.

merkys commented 1 year ago

Can we close this issue as answered, or should we clarify the specification accordingly?

ml-evs commented 1 year ago

I think we can close it, unless someone really wants to make a PR making this very explicit.