Closed ml-evs closed 1 year ago
I am almost sure the response_fields
should be retained (I think this is the original intention of the specification, even if not explicitly described). Therefore, this is a bug in the COD OPTIMADE implementation, and I will look into fixing it.
I believe the original intention for the pagination URLs is that they have to contain all the parameters required to get the next page of results without any additional parameters. I will also give the specification a look just to be 100% sure.
OK, I gave the specification a look, and it does not seem to be explicit regarding this. Nevertheless, I see only advantages in retaining the original URL parameters: The client does not need to "remember" the request. All its details are included in the URL parameters, thus fetching paginated results is not much more difficult than slurping non-paginated response (bonus: clients are encouraged to paginate).
Moreover, there were discussions (private exchange) about page limits depending on response_fields
. In the COD we would like to reduce page limits to 10 when a client requests for cartesian_site_positions
, as opposed to page limits of 100 or 1000 when atom positions are not requested. This would play nicely with retained URL parameters, and quite clumsy without them.
The single advantage of placing the onus of URL parameter setting on the client is slight increase of flexibility. By not giving any guarantees about the URL parameters, the specification would push the clients to implement URL string parsing and modification. But clients are already free to do so.
This is what the specification says:
next: represents a link to fetch the next set of results. When the current response is the last page of data, this field MUST be either omitted or null-valued.
I agree that it could be more explicit, but IMO the implicit expectation of "the next set of results" is that they contain the same fields as the previous set - otherwise they are not the same results.
Also important: the specification says nothing about that the URL provided in next
needs to be interpretable in terms of OPTIMADE (strictly it doesn't even need to point to the same host...), so it would be a bug for a client to assume that it can understand the next link and re-write it to, e.g., add parameters. Hence, if OPTIMADE does not require the next
link to give results with the same fields, then there would exist no way for a client to reliably paginate.
Fixed the COD implementation today. Now the URL parameters should be propagated.
Can we close this issue as answered, or should we clarify the specification accordingly?
I think we can close it, unless someone really wants to make a PR making this very explicit.
This point was raised on the forums (by @oarcelus) here: https://matsci.org/t/cod-response-does-not-include-response-fields-in-next-link/39530.
Currently the pymatgen
OptimadeRester
(and presumably other clients) rely on the fact that they can just use thelinks->next
URL directly for pagination, but it seems that some databases (for example, COD in the forum post @merkys) do not append URL parameters likeresponse_fields
to the pagination links, which currently breaks clients that expect e.g., COD, to returnnull
forcartesian_site_positions
.I do not think the specification currently specifies exactly how this should behave, hence the confusion.