airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.88k stars 4.07k forks source link

[source-younium] stop condition fails #36383

Open magsaas opened 6 months ago

magsaas commented 6 months ago

Connector Name

source-younium

Connector Version

0.2.0

What step the error happened?

During the sync

Relevant information

We use the Younium connector and the Bookings endpoint logic is failing for stop condition.

In our case, we have 700 bookings in Younium at the moment, which amounts to 7 pages, however the connector doesn't stop at 7 pages, which leads to the error.

Steps to reproduce using an API client like Postman:

  1. Authenticate to obtain the bearer token.
  2. Make a call to https://api.younium.com/Bookings?PageSize=100&PageNumber=1 where there's at least 1 booking. => If it's not the last page, the answer contains the key nextPage with the next page as value:
    {
    "pageNumber": 6,
    "pageSize": 100,
    "totalPages": 7,
    "totalCount": 700,
    "nextPage": "https://api.younium.com/Bookings?pageNumber=7&PageSize=100",
    "data": [confidential] }

=> If it's the last page, the answer contains the key lastPage pointing to the same url that was just requested, and no nextPage:

{
    "pageNumber": 7,
    "pageSize": 100,
    "totalPages": 7,
    "totalCount": 700,
    "lastPage": "https://api.younium.com/Bookings?pageNumber=7&PageSize=100",
    "data": [confidential] }
  1. Make a call to https://api.younium.com/Bookings?PageSize=100&PageNumber=X where X is a page you know doesn't exist (i.e. lastPage+1). => API replies with a 400 Bad Request: No Bookings Found.

What I've looked into

In the manifest file, the part related to the retrieval of data:

  retriever:
    type: SimpleRetriever
    record_selector:
      $ref: "#/definitions/selector"
    paginator:
      type: DefaultPaginator
      page_token_option:
        type: RequestPath
      pagination_strategy:
        type: CursorPagination
        page_size: 100
        cursor_value: '{{ response.get("nextPage", {}) }}'
        stop_condition: '{{ not response.get("nextPage", {}) }}'
      page_size_option:
        inject_into: request_parameter
        type: RequestOption
        field_name: PageSize

I guess there's a bug in how the stop_condition is handled but I'm not able to locate the logic responsible for handling this.

Relevant log output

2024-03-22 03:34:02 replication-orchestrator > Attempt 0 to update stream status running null:booking
2024-03-22 03:34:04 source > {"errors":{"message":["No Bookings could be found"]},"type":"https://tools.ietf.org/html/rfc7231#section-6.5.1","title":"One or more validation errors occurred.","status":400,"traceId":"00-50cf35e73247dab83b20351b60a53ef4-cd148a3a7cb05063-01"}
2024-03-22 03:34:04 source > Encountered an exception while reading stream booking
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 121, in read
    yield from self._read_stream(
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 200, in _read_stream
    for record in record_iterator:
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 310, in _read_full_refresh
    for record_data_or_message in record_data_or_messages:
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 448, in read_records
    yield from self._read_pages(
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 464, in _read_pages
    request, response = self._fetch_next_page(stream_slice, stream_state, next_page_token)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 490, in _fetch_next_page
    response = self._send_request(request, request_kwargs)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 388, in _send_request
    return backoff_handler(user_backoff_handler)(request, request_kwargs)
  File "/usr/local/lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 355, in _send
    raise exc
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 352, in _send
    response.raise_for_status()
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://api.younium.com/Bookings?pageNumber=8&PageSize=100
2024-03-22 03:34:04 source > Marking stream booking as STOPPED

Contribute

marcosmarxm commented 6 months ago

Thanks for reporting the issue @magsaas Younium is a community connector and it isn't in the current roadmap for improvements. If you want to contribute fixing the issue please reach me out in Slack so I can provide you instructions to make the contribution 🎖️

marcosmarxm commented 6 months ago

@ellipsis-dev please make a PR to implement a better stop condition.

octavia-squidington-iii commented 1 week ago

At Airbyte, we seek to be clear about the project priorities and roadmap. This issue has not had any activity for 180 days, suggesting that it's not as critical as others. It's possible it has already been fixed. It is being marked as stale and will be closed in 20 days if there is no activity. To keep it open, please comment to let us know why it is important to you and if it is still reproducible on recent versions of Airbyte.