airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.07k stars 4.11k forks source link

[source-linkedin-pages] Time bound sync error #42477

Open Rhiyo opened 3 months ago

Rhiyo commented 3 months ago

Connector Name

source-linkedin-pages

Connector Version

1.0.7

What step the error happened?

During the sync

Relevant information

While performing a daily sync I sometimes get an error that follower and share time bound statistics are incorrect time bounds.

It auto runs in the day and doesn't happen every time, usually if I run it again manually it works.

This is with incremental appends loads with timeRangeEnd as the cursor field.

Relevant log output

2024-07-23 22:27:41 source > Encountered an exception while reading stream follower_statistics_time_bound
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 136, in read
    yield from self._read_stream(
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 236, in _read_stream
    for record_data_or_message in record_iterator:
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/core.py", line 145, in read
    for record_data_or_message in records:
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/declarative_stream.py", line 120, in read_records
    yield from self.retriever.read_records(self.get_json_schema(), stream_slice)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/retrievers/simple_retriever.py", line 324, in read_records
    for stream_data in self._read_pages(record_generator, self.state, _slice):
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/retrievers/simple_retriever.py", line 288, in _read_pages
    response = self._fetch_next_page(stream_state, stream_slice, next_page_token)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/retrievers/simple_retriever.py", line 263, in _fetch_next_page
    return self.requester.send_request(
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/requesters/http_requester.py", line 461, in send_request
    return self._validate_response(response)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/requesters/http_requester.py", line 566, in _validate_response
    raise ReadException(error_message)
airbyte_cdk.sources.declarative.exceptions.ReadException: 
Stream Follower Statistics Time Bound: Start Date must be atmost 12 months before the request date (UTC) and atleast 2 days prior to the request date (UTC). 
See https://bit.ly/linkedin-pages-date-rules
2024-07-23 22:27:41 source > Marking stream follower_statistics_time_bound as STOPPED

Contribute

Rhiyo commented 3 months ago

I think my start date I had set in the source had just gone two far past the time bound limit. Is there a way to keep this dynamic so this doesn't happen?

Edit: Updated the start date to a more recent date and still getting the issue.

natikgadzhi commented 3 months ago

Nope. As it stands, config inputs are static. You can change them manually, not dynamically.

We could just say that if the config input is not present, we should always default to dynamically allocated 6 months or so, but still, that would be a jinja expression in the connector code.

Want to try and contribute?

avirajsingh7 commented 2 months ago

Hey @natikgadzhi, I can work on this one

We can do one thing whenever we are out of time bound, we can set start_date to (todays_date - 12 month)

natikgadzhi commented 2 months ago

You can try and set something like {{ max(config['start_date'], date("12 months ago") }} (pseudocode obv). I don't remember how exactly, but I know our Jinja interpolations have a way of getting a date relative of today.

natikgadzhi commented 2 months ago

@avirajsingh7 set status to in-progress when you'll get started on this, please!

avirajsingh7 commented 2 months ago

Hey @natikgadzhi , I believe start_date is limited to 365 days , check here

natikgadzhi commented 2 months ago

Well, yeah, and the API seems to require not longer than a year, but I think that because the sync takes time, by the time request is performed, the date is too far gone. I.e. a few minuets make a difference.

I would just set the max value to 350 days or something.

Stream Follower Statistics Time Bound: Start Date must be atmost 12 months before the request date (UTC) and atleast 2 days prior to the request date (UTC). See https://bit.ly/linkedin-pages-date-rules