airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.47k stars 3.99k forks source link

Source Jira: connection stucks for long time on `issue_worklogs` stream #30245

Open crabio opened 1 year ago

crabio commented 1 year ago

Connector Name

source-jira

Connector Version

v0.3.13

What step the error happened?

During the sync

Revelant information

Configured connection from Jira to Snowflake with incremental data replication from issues and issue_worklogs. All is going fine, but it seems like connector wait first record from issue_worklogs, becuase it stuck on 50 minutes and got only 1 record. It happened every time after first full replication.

Relevant log output

2023-09-07 07:00:59 source > Marking stream issue_worklogs as STARTED
2023-09-07 07:00:59 source > Syncing stream: issue_worklogs 
2023-09-07 07:04:55 source > Marking stream issue_worklogs as RUNNING
2023-09-07 07:51:00 source > Read 1 records from issue_worklogs stream
2023-09-07 07:51:00 source > Marking stream issue_worklogs as STOPPED
2023-09-07 07:51:00 source > Finished syncing issue_worklogs
2023-09-07 07:51:00 source > SourceJira runtimes:

Contribute

darynaishchenko commented 12 months ago

Connector Version v0.3.13

@crabio Did you try it on latest version(0.6.0)?

crabio commented 11 months ago

Yes, it seems better starting from 0.5.0! Thank you!

crabio commented 11 months ago

Uh.. no.. problem presented.. version 0.6.3

2023-09-20 07:58:50 source > Marking stream issue_worklogs as RUNNING
2023-09-20 08:38:54 source > Backing off _send(...) for 5.0s (requests.exceptions.ConnectionError: ('Connection aborted.', TimeoutError(110, 'Connection timed out')))
2023-09-20 08:38:54 source > Caught retryable error '('Connection aborted.', TimeoutError(110, 'Connection timed out'))' after 1 tries. Waiting 5 seconds then retrying...
darynaishchenko commented 11 months ago

@crabio, this logs above are created by Backoff strategy, which is expected behavior if connector receive retryable error from API, for example in case of rate limits.

crabio commented 11 months ago

Yes, sure :) But issue still present in the 0.6.3

I had successfull load today with log:

2023-09-21 00:07:09 destination > INFO i.a.i.d.s.SnowflakeInternalStagingSqlOperations(uploadRecordsToStage):115 Successfully loaded records to stage 2023/09/21/00/A3C5C7CD-9BE3-4355-BCB3-C9CA0CAA0A71/ with 0 re-attempt(s)
2023-09-21 00:07:10 destination > INFO i.a.i.d.r.FileBuffer(deleteFile):109 Deleting tempFile data d45ec051-5e56-40ae-99e6-9455c863100514137509493976350350.csv.gz
2023-09-21 00:07:10 destination > INFO i.a.i.d.GlobalMemoryManager(free):86 Freeing 2138 bytes..
2023-09-21 00:07:10 INFO i.a.w.g.ReplicationWorkerHelper(processMessageFromDestination):228 - State in DefaultReplicationWorker from destination: io.airbyte.protocol.models.AirbyteMessage@d051667[type=STATE,log=<null>,spec=<null>,connectionStatus=<null>,catalog=<null>,record=<null>,state=io.airbyte.protocol.models.AirbyteStateMessage@58ea7ed0[type=STREAM,stream=io.airbyte.protocol.models.AirbyteStreamState@5f2f1f34[streamDescriptor=io.airbyte.protocol.models.StreamDescriptor@3904eb6c[name=issue_comments,namespace=<null>,additionalProperties={}],streamState={"updated":"2023-09-20T23:51:29.617000+00:00"},additionalProperties={}],global=<null>,data={"issue_comments":{"updated":"2023-09-20T23:51:29.617000+00:00"},"issues":{"updated":"2023-09-20T23:51:38.871000+00:00"},"issue_worklogs":{"updated":"2022-12-06T13:13:15.446000+00:00"}},additionalProperties={}],trace=<null>,control=<null>,additionalProperties={}]</null></null></null></null></null></null></null></null></null>
2023-09-21 00:17:43 source > Marking stream issue_worklogs as STARTED
2023-09-21 00:17:43 source > Syncing stream: issue_worklogs 
2023-09-21 00:29:58 source > Marking stream issue_worklogs as RUNNING
2023-09-21 01:22:06 source > Read 1 records from issue_worklogs stream
2023-09-21 01:22:06 source > Marking stream issue_worklogs as STOPPED
2023-09-21 01:22:10 INFO i.a.w.p.KubePodProcess(close):787 - (pod: dwh-airbyte-stable / source-jira-read-778-0-lzjso) - Closed all resources for pod
2023-09-21 01:22:18 INFO i.a.w.t.TemporalAttemptExecution(get):138 - Cloud storage job log path: /workspace/778/0/logs.log
2023-09-21 01:22:24 normalization > Running: transform-config --config destination_config.json --integration-type snowflake --out /config
2023-09-21 01:22:06 source > Finished syncing issue_worklogs
2023-09-21 01:22:06 source > SourceJira runtimes: