Pagination - Githubissues

gbl08ma commented 5 years ago

The client has no notion of how the pagination in the API operates. Implement pagination ~~(or the API must be changed so that pagination is not used with this client?)~~

oNaiPs commented 5 years ago

IMO pagination should always be there. Perhaps we could have a default table limit that can be customized?

E.g.

--limit=100
--limit=max

luisalima commented 5 years ago

@gbl08ma I agree with @oNaiPs -- the API should not be changed to accomodate a particular client request.

We are using RESTful link pagination. One good example of the standard is here: https://developer.github.com/v3/guides/traversing-with-pagination/

luisalima commented 5 years ago

@gbl08ma maybe, in the client, we could abstract from the pages and ask for the records. For example, records starting at 100 and going to 200:

--record-start=100 --record-end=200

there might be some examples in other clis.

gbl08ma commented 5 years ago

Pagination, as it works right now, makes sense if the client is displaying to the user, but if it is retrieving e.g. the full list of users (which might potentially have hundreds of records), for example, to dump into a CSV file, assuming each page has 20 records, it would need to do dozens of requests, which seems terribly inefficient and slow. It is still definitely doable, it's a question of how many requests (and DB transactions, etc.) one is willing to serve to accomplish such a task. Ideally the client would have a way to ask the server for larger pages with each request, up to a sensible limit (100 to 1000 records, for example).

Often APIs provide the means to specify a starting ID and a limit for the number of records to retrieve, especially if the IDs are sortable. This is the case when retrieving messages in the Discord API (limits to 100 records), when listing files in Amazon S3 buckets (limits to 1000 keys), etc.

luisalima commented 5 years ago

@gbl08ma you can specify those options to the API already. We comply with the standard. Although maybe we could think about enabling HTTP streaming for the large requests...

gbl08ma commented 5 years ago

As far as I can see, the API accepts a page and a per-page parameter. These are not record IDs and I think this may lead to atomicity/isolation failures if records are added or removed in-between requests. For example, assuming per-page is 2 and initially there are records A, B, C, D and E:

Client 1 retrieves page 1, server replies with [A, B]
Record B is removed by client 2
Client 1 retrieves page 2, server replies with [D, E] (record C is missed, because the first page is now [A, C])

Am I missing something, or is this situation possible with the current pagination implementation?

luisalima commented 5 years ago

@gbl08ma don't forget that IDs maybe -- and in many cases, in our API are -- UUIDs, which are not sortable.

Yes, that is a good observation and that situation is indeed possible (however probably unlikely in this use case unless the cli is being used in a script, since there is only a limited number of admins for each company).

If we want to prevent that situation, we will need to extend the API pagination, but I would suggest to use a time filter instead of relying on IDs for now: for example, records from a particular timestamp to another particular timestamp, in order to not rely on a potentially unorderable field. Makes sense? WDYT?

gbl08ma commented 5 years ago

I agree. Most APIs I have used, either use sequential IDs, or something derived from time (like snowflakes in the case of Twitter and Discord). Usually when the primary ID is not sortable, a secondary one (such as a creation/update timestamp) is used. This goes in line with your example. It is very rare that there is absolutely no logical way to sort a set of records, and even in those cases (for example, if records consist of one column with UUIDs, i.e. a set of UUIDs), IDs can be sorted alphabetically, exclusively for pagination purposes.

I agree that with the current pagination mechanism, the problem I described should be very uncommon, especially if the client chooses to use large pages (the lower the amount of requests per operation, the lower the chance for such problems to occur). When I created this issue, I did not have in mind the existence of the per-page parameter.

gbl08ma commented 5 years ago

This is solved, for now, in 6c5e8dd6e7818f34a7f6ca8d617d6121b164a058. Sadly, the client generated by go-swagger does not include any pagination helpers, we must write them ourselves, and Go does not support generics which would be useful in this kind of code. Pagination will be a concern for each command/function that requests paginated records from the server; thanks to the helper function, it can be done with less than 10 extra lines of code per command.

luisalima commented 5 years ago

Thanks @gbl08ma ; from what I understood, you are now retrieving all pages, correct?

gbl08ma commented 5 years ago

Yes. This issue was more about the fact that the client was only retrieving the first page, missing data. I created #6 to track the progress of more advanced pagination features.

barracuda-cloudgen-access / access-cli

Pagination #2