airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
14.46k stars 3.71k forks source link

Low-Code: Page Increment won't stop if last page is full #28226

Open domzae opened 10 months ago

domzae commented 10 months ago

Topic

Low-Code CDK Paginator failure

Revelant information

I'm currently working on a Low-Code connector with the LearnWorlds API. It uses Page Increment for pagination, however if the number of items on the last page is equal to the page size, a request will be made for the next page, and it will not fail, but rather return the last page again. This will happen indefinitely, and the connector will infinitely duplicate the last page.

To try and give a clearer example: Let's imagine we use the event-logs endpoint. It has a limit of 50 event logs per page. Let's say there happen to be exactly 100 event logs

  1. The first request is made to {url}/v2/event-logs?page=1. It returns event logs 1-50. Because it returned 50 records, it looks for the next page.
  2. The second request is made to {url}/v2/event-logs?page=2. It returns event logs 51-100. Because it returned 50 records, it looks for the next page.
  3. The third request is made to {url}/v2/event-logs?page=3. It returns event logs 51-100. Because it returned 50 records, it looks for the next page.
  4. The fourth request is made to {url}/v2/event-logs?page=4. It returns event logs 51-100. Because it returned 50 records, it looks for the next page. (...and so on)

As you can see, even if the page given doesn't exist, the request is "successful", which is problematic. The response does return some meta data:

{
  "data": [
    ...
  ],
  "meta": {
    "page": 1,
    "totalItems": 100,
    "totalPages": 2,
    "itemsPerPage": 50
  }
}

So it would be possible to check page == meta.totalPages to know when we've reached the last page. But such an option doesn't seem to be available.

Is there a way around this I'm not seeing? Or does some feature need to be added to make this possible with Low-Code?

girarda commented 5 months ago

This can be implemented by adding a stop_condition to the pagination strategies similar to CursorPaginationStrategy's.

This could either be implemented in the strategies, at the cost of some duplication, or in the default paginator, at the cost of inconstancy since we cannot remove the stop condition from the cursor strategy without a breaking change

girarda commented 5 months ago

grooming notes: