Open sherifnada opened 2 years ago
@sherifnada did you think issue with performance connected is source issue.
How I understand @alafanechere talk about that here: https://github.com/airbytehq/airbyte/issues/12671#issuecomment-1119632804
@lazebnyi this should not block certifications atm
Team, let's pick this back up alongside an investigation of: https://github.com/airbytehq/oncall/issues/274. Please reach out to @sherifnada when you dig in to gain access to the impacted workspace.
@sherifnada Can I have the creds to this high volume data account to proceed with tests?
Another complain in Discourse: https://airbyte7538.zendesk.com/agent/tickets/1459
And from other Intercom issue in github https://github.com/airbytehq/airbyte/issues/12506 looks contact took 15h to finished, in this case the stream has the majority of data (1mm records)
2022-05-02 00:53:35 [44msource[0m > Read 1002750 records from contacts stream
2022-05-02 00:53:35 [44msource[0m > Finished syncing contacts
2022-05-02 00:53:35 [44msource[0m > SourceIntercom runtimes:
Syncing stream admins 0:00:02.460130
Syncing stream contacts 15:01:43.301241
2022-05-02 00:53:35 [44msource[0m > Syncing stream: tags
2022-05-02 00:53:37 [32mINFO[m i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):301 - Records read: 1004000 (1 GB)
2022-05-02 00:53:37 [44msource[0m > Read 1268 records from tags stream
2022-05-02 00:53:37 [44msource[0m > Finished syncing tags
2022-05-02 00:53:37 [44msource[0m > SourceIntercom runtimes:
Syncing stream admins 0:00:02.460130
Syncing stream contacts 15:01:43.301241
Syncing stream tags 0:00:01.593165
Zendesk ticket #1459 has been linked to this issue.
Comment made from Zendesk by Marcos Marx on 2022-07-05 at 12:28:
Hello Alelxis, there is one issue in Github https://github.com/airbytehq/airbyte/issues/11595 about improving Intercom speed. I saw the code implementation and this stream doesn't have any special code compared to others streams (companies, tags, segments). In any case I'll return to you when the issue is resolved.
We disabled incremental for the contacts
stream, swapped from /contacts/search
(POST) to /contacts
(GET) and this solves the request throttle.
We disabled incremental for the
contacts
stream, swapped from/contacts/search
(POST) to/contacts
(GET) and this solves the request throttle.
How many records do you have for contacts stream?
We disabled incremental for the
contacts
stream, swapped from/contacts/search
(POST) to/contacts
(GET) and this solves the request throttle.How many records do you have for contacts stream?
More than 9Gb according to logs
Potentially one promising direction here is to use the export
functionality of the intercom API. More information here: https://developers.intercom.com/intercom-api-reference/reference/export-job-model
@sherifnada We are currently facing this issue with company_segments taking at least 12 hours
@sherifnada The link https://developers.intercom.com/intercom-api-reference/reference/export-job-model is not available. Instead, this one works fine: https://developers.intercom.com/intercom-api-reference/reference/the-export-job-model
the Export-Jobs
are available for the Messages
stream only and used along with the Unstable
API version.
More context here: https://github.com/airbytehq/airbyte/issues/9188#issuecomment-1422553673
Unfortunately, we cannot use it for all streams available for now.
@mrhallak
As for the company_segments
stream, it's slow in its nature, since depends on the Companies
stream. Both of them don't allow filtering out the records on the API side, thus we have to fetch all of the data from both of them and then filter the latest. There is no workaround for this right now.
The general speed of the connector has already been tuned to its max, considering rate limits and caching strategy. The other option is to make dependent streams call their endpoints in async mode (in the theory of course)
Tell us about the problem you're trying to solve
A user was trying to sync a high volume instance of Intercom (logs below). The connector spent many hours (50+ hours) syncing data from the contacts stream. This is a bad user experience as it does not allow them to make use of the product and data quickly. logs-97415.txt
Note that this issue is not just trying to do this for intercom, but should be used as a learning opportunity for how this can be done at the CDK level as described in airbytehq/airbyte-internal-issues#504
Describe the solution you’d like
I would like us to find a way to speed up intercom syncs coming from high volume instances such as this one. Ideally a sync takes no longer than a couple of hours in 99% of cases.