Open hyperbolix opened 8 months ago
@hyperbolix I've been reading this ticket and I'm a little confused here. The index
pagination is intended to be used when the API in question returns an index or an index link to indicate the next page.
With that said, the parameters page_size
and total_results
suggest that the correct pagination method would be page
pagination. (https://github.com/apache/drill/blob/master/contrib/storage-http/Pagination.md#page-pagination). Page pagination is when the API provides paging information such as the page size and result count.
If you use the page pagination with a limit, that will work and there are unit tests for that.
This test rig doesn't exactly represent the system I've been experimenting with using Drill that led to the discovery of this bug. It was simply the most expedient way to demonstrate the issue. It helped to be able to tune total result size and max results per page to prove the issue would occur at various page sizes, and it helped to keep the 'next_page' token simple.
I appreciate your guidance in the matter; I can assure you the I am actually using page tokens in my experiments thereby necessitating the use of the 'index' pagination method. I can't recall a single cloud computing list API that doesn't use page tokens, and if I came across one, I'd be worried about using it.
Describe the bug HTTP Storage Plugin INDEX pagination mode is field-order sensitive and should not be (JSON is supposed to be unordered). When the pagination fields 'has_next' and 'next_page' come after the data path field 'results', the presence of a next page is ignored. The next page fields are only honored when they come before the data path field.
To Reproduce Steps to reproduce the behavior:
select * from http.a_bug_test where total_results_val=300 and page_size_val=10 limit 15
Expected behavior The INDEX pagination mode page token fields should be honored regardless of field ordering in the JSON response. Or, at a minimum, the documentation should clearly indicate that these fields must come before the results field (which is not the order in the example in the HTTP pagination readme today).
Error detail, log output or screenshots When the 'has_next' and 'next_page' fields come after the data path 'results' field, they are ignored, meaning the result set is incorrectly trimmed short, as though there are not additional pages, yet there are additional pages.
Drill version Observed in 1.21.1 and also in the latest commit as of 2024-03-26: 749772cb0bd83c1a8fe455410ec80b1e5a9bf239