airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.56k stars 4.01k forks source link

New Integration Request: Instagram Insights (Media, Story and User) #1814

Closed jim-barlow closed 3 years ago

jim-barlow commented 3 years ago

Tell us about the new integration you’d like to have

Which source and which destination? Which frequency?

Instagram insights to Google BigQuery, every <10 minutes(!). It's actually from the Facebook Graph API so it might be possible to reuse some aspects of the Facebook Marketing Source Connector.

Describe the context around this new integration

Which team in your company wants this integration, what for? This helps us understand the use case.

I have already built this as a custom extractor for a client using Google Cloud Functions - the reason is that:

Describe the alternative you are considering or using

What are you considering doing if you don’t have this integration through Airbyte?

It was not viable to do this via e.g. Rivery as the sheer number of data transfers would have been prohibitively expensive, and the multi-account to single business account makes it incompatible with other tools. It feels like a great use-case for us to try out Airbyte, and I would be happy to share the (Python) code as deployed.

I am already running it via a set of Cloud Function (orchestrated via PubSub from other Cloud Function so each invocation only queries data from a single account), however there is an ongoing cost associated with this and I would rather move it to a dedicated platform and monitor/manage it alongside other flows instead of as custom code.

michel-tricot commented 3 years ago

Hi @jimbeepbeep thank you for this issue! Looks like a great use-case for Airbyte indeed. Let me contact you on slack so we can discuss how to move forward on this integration.

jim-barlow commented 3 years ago

Great to catch up on this @michel-tricot, let me know if there's anything we can do to get it kicked off.

sherifnada commented 3 years ago

Seems like instagram docs have moved. new location here

sherifnada commented 3 years ago

As far as I can tell from looking at the API docs we should be able to do this once we have a few prerequisites:

  1. An Instagram Business Account or Instagram Creator Account
  2. A Facebook Page connected to that account
  3. A Facebook Developer account that can perform Tasks on that Page
  4. A registered Facebook App with Basic settings configured
  5. Populate data for all the relevant streams (mentioned by jim above)

then we can rock and roll.

jim-barlow commented 3 years ago

That sounds about right @sherifnada, they make you do a merry little dance before you can get any data, and the error messages can be a little confusing if not downright misleading! One really great resource here is the Graph API Explorer, which is also the easiest place to generate an access token for testing too. In order to access the data from multiple Instagram Accounts (which is essential and would also be pretty unique to Airbyte), you need to generate a User Token with a set of permissions. I used the following (although in reality you probably don't need them all:

pages_manage_cta
pages_manage_instant_articles
pages_show_list
ads_management
business_management
instagram_basic
instagram_manage_comments
instagram_manage_insights
pages_read_engagement
pages_manage_metadata
pages_read_user_content
pages_manage_ads
pages_manage_posts
pages_manage_engagement
public_profile

As an example, the query I use to get all of the Instagram Accounts which I have access to, and which are linked to my client's Facebook Business Account (you can see this in the Facebook App Dashboard in the Basic -> Verification section) is:

FACEBOOK_BUSINESS_ACCOUNT/instagram_business_accounts?fields=id,ig_id,username&limit=500

Note that this is prepended with https://graph.facebook.com/{graph-api-version}/ i.e. https://graph.facebook.com/v9.0/ and appended with &access_token={access_token}

We had numerous issues figuring out the right setup to get this working so let me know if you have any questions! Also it's interesting to note that you can get a lot of data from field expansion which could minimise the number of endpoints you need to hit, you just need to process the paginated responses which I'm sure you guys are pretty pro at.

sherifnada commented 3 years ago

Thanks for the context @jimbeepbeep! We'll realistically be able to start work on this in the coming couple of weeks. We'll begin work on the account setup/data population etc next week

jim-barlow commented 3 years ago

That's great @sherifnada, let me know if you need anything else. Our experience was that getting the accounts set up correctly was one of the biggest headaches and the Facebook/Instagram terminologies and requirements can be really confusing. If you're having any issues then please contact me and I can probably get you set up with access to a few shared accounts for testing purposes.

yevhenii-ldv commented 3 years ago

Scoping results for this task:

  1. Having investigated the documentation of the Facebook Graph API related to Instagram, I can confirm the information that was indicated above, we can read the data for the following streams using Facebook Graph API:

  2. I tried testing requests with an existing test business Facebook account from Airbyte, and really did not find any Instagram Account binding. I agree with the @sherifnada message, before starting the development - we need to link your Instagram account and fill it with data.

Assuming the second step is complete, my estimate of the time to implement this connector with rate limiting handling is 4 days.

sherifnada commented 3 years ago

Breakdown:

sherifnada commented 3 years ago

@yevhenii-ldv could you create tickets for the above breakdown and add to the connector roadmap?

jim-barlow commented 3 years ago

As confirmed to Shrif on Slack, I can help with the test account as setting up all of the links is a bit of a nightmare! I'll DM the details there.

sherifnada commented 3 years ago

@jimbeepbeep small question on the UX for this connector: would you expect each instance of the connector to sync data for one instagram account, or would you want a single connector instance to pull data for multiple accounts?

michel-tricot commented 3 years ago

by account do you mean credentials or user handle? if it is credentials, we need to be consistent with our definition of connector and it should only be one per connector instance.

jim-barlow commented 3 years ago

@sherifnada that exactly the right question. A single set of credentials can have multiple accounts associated, and the precise problem we had with existing approaches was that they just worked for a single account. We definitely need multiple accounts but not multiple tables. Currently we stream the API response as JSON into BQ and then decode into a nested table using a custom function.

sherifnada commented 3 years ago

@jimbeepbeep Question about User Insights: which metric/period combinations make sense? My current inclination is to expose all of them under one table which has columns reflecting each possible metric/period combination. The schema of the table would look something like:

user_id
reach_1d
reach_7d
reach_28d
impressions_1d
impressions_7d
impression_28d
website_clicks
etc...

WDYT about this breakdown?

jim-barlow commented 3 years ago

@sherifnada yes that sounds like a good approach.

jim-barlow commented 3 years ago

@sherifnada how are you guys going on this? I need to update some of the metrics and accounts which my (hopefully stop-gap!) is pulling and wondered if it would be helpful for me to test anything with real data?

sherifnada commented 3 years ago

@jimbeepbeep I'd say we're ~85% done with the connector and expect to merge it in the next couple of days. Would that work for your schedule?

jim-barlow commented 3 years ago

@sherifnada you guys are amazing, thanks!

sherifnada commented 3 years ago

@jimbeepbeep hi!

We just added this connector and merged it into master.

The connector will be available in the next release of Airbyte (we usually release every Tuesday). If you can't wait and want to get started now trying it out in a running Airbyte instance, add the connector like described in https://docs.airbyte.io/integrations/custom-connectors#adding-your-connectors-in-the-ui . The information for the connector is:

Display Name: Instagram API Docker repository name: airbyte/source-instagram Docker image tag: 0.1.0 Documentation URL: hub.docker.com/r/airbyte/source-instagram

Please let us know if you encounter any issues or have any questions.

Enjoy!

jim-barlow commented 3 years ago

Amazing, thanks team. We'll get on testing this today!

jim-barlow commented 3 years ago

OK @sherifnada I have added the connector according to your instructions above, which seem to work fine. However when I try and test the connection using the secret which we use in production (stored in Google Secrets Manager) I get the following error:

image

Failed Logs:

2021-03-16 09:35:26 INFO (/tmp/workspace/67/0) WorkerRun(call):62 - Executing worker wrapper. Airbyte version: AIRBYTE_VERSION
2021-03-16 09:35:26 INFO (/tmp/workspace/67/0) TemporalAttemptExecution(get):79 - Executing worker wrapper. Airbyte version: AIRBYTE_VERSION
2021-03-16 09:35:26 INFO (/tmp/workspace/67/0) LineGobbler(voidCall):69 - Checking if airbyte/source-instagram:0.1.0 exists...
2021-03-16 09:35:26 INFO (/tmp/workspace/67/0) LineGobbler(voidCall):69 - airbyte/source-instagram:0.1.0 was found locally.
2021-03-16 09:35:26 DEBUG (/tmp/workspace/67/0) DockerProcessBuilderFactory(create):104 - Preparing command: docker run --rm -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/67/0 --network host airbyte/source-instagram:0.1.0 check --config source_config.json
2021-03-16 09:35:27 ERROR (/tmp/workspace/67/0) DefaultAirbyteStreamFactory(internalLog):108 - Error: 190, Error validating access token: The session is invalid because the user logged out.
2021-03-16 09:35:27 ERROR (/tmp/workspace/67/0) DefaultAirbyteStreamFactory(internalLog):108 - Check failed
2021-03-16 09:35:28 DEBUG (/tmp/workspace/67/0) DefaultCheckConnectionWorker(run):93 - Check connection job subprocess finished with exit code 0
2021-03-16 09:35:28 DEBUG (/tmp/workspace/67/0) DefaultCheckConnectionWorker(run):94 - Check connection job received output: io.airbyte.config.StandardCheckConnectionOutput@1f8b6e17[status=failed,message=Error: 190, Error validating access token: The session is invalid because the user logged out.]

If I log into the Graph API Explorer and generate another User Token with these permissions :

- pages_manage_cta
- pages_manage_instant_articles
- pages_show_list
- ads_management
- business_management
- instagram_basic
- instagram_manage_comments
- instagram_manage_insights
- pages_read_engagement
- pages_manage_metadata
- pages_read_user_content
- pages_manage_ads
- pages_manage_posts
- pages_manage_engagement
- public_profile

'Testing Connection' takes a little bit longer, then I get the following error:

image

With no logs available to share. Let me know any steps you think I might need to debug!

jim-barlow commented 3 years ago

I have checked through the server logs and I think I've found the issue. I'm not sure where to enter the account_id (and also which account_id it is, but that seems to be the root cause of the problem:

2021-03-16 09:46:20 INFO  REQ 172.18.0.1 OPTIONS 200 /api/v1/sources/create
2021-03-16 09:46:20 DEBUG cache hit: airbyte/source-instagram:0.1.0
2021-03-16 09:46:20 DEBUG Known exception
io.airbyte.server.errors.KnownException: The provided configuration does not fulfill the specification. Errors: json schema validation failed. 
errors: $.account_id: is missing but it is required 
schema: 
{
  "type" : "object",
  "title" : "Source Instagram",
  "$schema" : "http://json-schema.org/draft-07/schema#",
  "required" : [ "account_id", "access_token" ],
  "properties" : {
    "start_date" : {
      "type" : "string",
      "pattern" : "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$",
      "examples" : [ "2020-09-25T00:00:00Z" ],
      "description" : "The date from which you'd like to replicate data for User Insights, in the format YYYY-MM-DDT00:00:00Z. All data generated after this date will be replicated."
    },
    "access_token" : {
      "type" : "string",
      "description" : "The value of the access token generated. See the <a href=\"https://docs.airbyte.io/integrations/sources/instagram\">docs</a> for more information",
      "airbyte_secret" : true
    }
  },
  "additionalProperties" : false
} 
object: 
{
  "start_date" : "2021-01-01T00:00:00Z",
  "access_token" : "[JB_REDACTED]"
}
yevhenii-ldv commented 3 years ago

@jimbeepbeep hi! Thank you very much for your attention and information provided. You are absolutely right, the problem lies precisely with the account_id. I have already created an issue for your request and we will solve this problem in the near future (I think in the near hours). As soon as we update the connector, I will immediately inform you about it in the comments to this issue.

Thanks a lot!

yevhenii-ldv commented 3 years ago

@jimbeepbeep We have fixed the bug and now an updated version of the Instagram connector is available. Could you try new Instagram connector version and let us know if that works? ;)

jim-barlow commented 3 years ago

Thanks @yevhenii-ldv - what do I need to do to get the revised version please? Is there an increment to the docker image tag or do I need to upgrade as per this guide? I have two instances running on different VMs on Google Cloud - the one I used previously still shows the connection I manually created (which still does not work with valid credentials), and I can't see Instagram as an option in either. I manually created another connection with 0.1.0 image tag and get the same error in the server logs:

Caused by: io.airbyte.validation.json.JsonValidationException: json schema validation failed. 
errors: $.account_id: is missing but it is required 
jim-barlow commented 3 years ago

OK I created an entirely new VM and instance and the Instagram connector did show up, however it still fails. Posting the entire server log as there are a couple of references to Instagram in there:

    ___    _      __          __
   /   |  (_)____/ /_  __  __/ /____
  / /| | / / ___/ __ \/ / / / __/ _ \
 / ___ |/ / /  / /_/ / /_/ / /_/  __/
/_/  |_/_/_/  /_.___/\__, /\__/\___/
                    /____/
--------------------------------------
 Now ready at http://localhost:8000/
--------------------------------------
Version: 0.17.1-alpha

2021-03-17 06:19:34 INFO  REQ 172.18.0.1 OPTIONS 200 /api/v1/workspaces/get
2021-03-17 06:19:36 INFO  REQ 172.18.0.1 POST 200 /api/v1/workspaces/get - {"workspaceId":"5ae6b09b-fdec-41af-aaf7-7d94cfc33ef6"}
2021-03-17 06:19:36 INFO  REQ 172.18.0.1 OPTIONS 200 /api/v1/web_backend/connections/list
2021-03-17 06:19:36 INFO  REQ 172.18.0.1 POST 200 /api/v1/web_backend/connections/list - {"workspaceId":"5ae6b09b-fdec-41af-aaf7-7d94cfc33ef6"}
2021-03-17 06:19:36 INFO  REQ 172.18.0.1 OPTIONS 200 /api/v1/source_definitions/list
2021-03-17 06:19:37 INFO  REQ 172.18.0.1 POST 200 /api/v1/source_definitions/list - {"workspaceId":"5ae6b09b-fdec-41af-aaf7-7d94cfc33ef6"}
2021-03-17 06:19:41 INFO  REQ 172.18.0.1 OPTIONS 200 /api/v1/sources/list
2021-03-17 06:19:41 INFO  REQ 172.18.0.1 POST 200 /api/v1/sources/list - {"workspaceId":"5ae6b09b-fdec-41af-aaf7-7d94cfc33ef6"}
2021-03-17 06:19:54 INFO  REQ 172.18.0.1 OPTIONS 200 /api/v1/source_definition_specifications/get
2021-03-17 06:19:54 DEBUG cache miss: airbyte/source-instagram:0.1.0
2021-03-17 06:19:54 INFO  enqueuing pending job for scope: airbyte/source-instagram:0.1.0
2021-03-17 06:19:54 INFO  Waiting for job id: 3
2021-03-17 06:20:01 INFO  REQ 172.18.0.1 POST 200 /api/v1/source_definition_specifications/get - {"sourceDefinitionId":"6acf6b55-4f1e-4fca-944e-1a3caef8aba8"}
2021-03-17 06:20:31 INFO  REQ 172.18.0.1 OPTIONS 200 /api/v1/scheduler/sources/check_connection
2021-03-17 06:20:32 INFO  enqueuing pending job for scope: 
2021-03-17 06:20:32 INFO  Waiting for job id: 4
2021-03-17 06:20:48 INFO  REQ 172.18.0.1 POST 200 /api/v1/scheduler/sources/check_connection - {"sourceDefinitionId":"6acf6b55-4f1e-4fca-944e-1a3caef8aba8","connectionConfiguration":"REDACTED"}
2021-03-17 06:20:48 INFO  REQ 172.18.0.1 OPTIONS 200 /api/v1/sources/create
2021-03-17 06:20:48 DEBUG cache hit: airbyte/source-instagram:0.1.0
2021-03-17 06:20:48 WARN  Unknown keyword examples - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2021-03-17 06:20:48 WARN  Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2021-03-17 06:20:48 DEBUG Known exception
io.airbyte.server.errors.KnownException: The provided configuration does not fulfill the specification. Errors: json schema validation failed. 
errors: $.account_id: is missing but it is required 

Please let me know next steps. Also, can you please add @joseignaciorm to this issue as he'll be taking the lead on this.

sherifnada commented 3 years ago

@jimbeepbeep @joseignaciorm

apologies for the mix up on how to upgrade the connector here!

You'll need to upgrade your connector to version 0.1.1 to pick up the bugfix. To upgrade your connector version, go to the admin panel in the left hand side of the UI, find the instagram connector in the list, and input the latest connector version.

Please let us know if you have any feedback or issues.

Enjoy!

jim-barlow commented 3 years ago

Thanks @sherifnada, @yevhenii-ldv and team, I've created a new (Instagram) connector from from 0.1.1 and I can confirm... it seems to work perfectly! I'll get to work transforming the data for our analytics and running some QA but it looks great from my initial inspection, and thanks for also including the JSON fields for the more weirdly structured metric responses. You guys have done a great job and (I believe) created the only way in the world of syncing data across multiple linked IG accounts in one single pipeline, with just one configuration, whilst also eliminating any process for new account onboarding. @michel-tricot and @johnlafleur your team rock!

michel-tricot commented 3 years ago

Thanks @jimbeepbeep!

sherifnada commented 3 years ago

@jimbeepbeep our pleasure! So glad to hear it's useful :)

jim-barlow commented 3 years ago

OK @sherifnada I've set this connection up on a 3 hourly sync and have had the chance to spend a bit more time with this data today. A lot looks right but there are a few issues in the data which we need to look at. As you know it can get a little confusing with the different id fields, but it's important to note that the id field in the users table corresponds to the business_account_id in all of the other tables (where available).

QA for distinct counts: image

  1. stories table is currently empty, however story_insights is not (which means that there must be stories)
  2. It looks like my distinct id counts in users (81), user_insights (81) and user_lifetime_insights (80) and story_insights (55) are approximately as expected - some accounts don't have stories so I would expect story_insights to be lower... however I can't do detailed QA with an empty stories table
  3. However, for the media and media_insights tables there is just a single business_account_id and this means that there is only data for one account in here instead of the full 81. This is also reflected in the row counts (120 for media_insights vs. 2754 for user_insights).

Please let me know if there's anything I can do or any further information you need to address these issues.

sherifnada commented 3 years ago

@jimbeepbeep thanks for reporting the issue. Could you share the logs for the relevant jobs? We'll investigate on our side as well.

sherifnada commented 3 years ago

@jimbeepbeep I'm unable to reproduce a similar issue on my side. Let's hope logs can provide some insights!

jim-barlow commented 3 years ago

@sherifnada I have re-run and include the logs below, also updated my previous comment as stories are ephemeral and they do show up in subsequent syncs. However the issue is still there with the media and media_insights:

image

Logs below, looks like the streams might be bombing out after an OAuthException error, not sure if that's something we can fix our end... let me know your thoughts.

2021-03-24 06:05:34 INFO (/tmp/workspace/58/0) WorkerRun(call):62 - Executing worker wrapper. Airbyte version: AIRBYTE_VERSION
2021-03-24 06:05:34 INFO (/tmp/workspace/58/0) TemporalAttemptExecution(get):79 - Executing worker wrapper. Airbyte version: AIRBYTE_VERSION
2021-03-24 06:05:34 INFO (/tmp/workspace/58/0) DefaultSyncWorker(run):86 - configured sync modes: {stories=full_refresh, user_lifetime_insights=full_refresh, media=full_refresh, story_insights=full_refresh, user_insights=incremental, users=full_refresh, media_insights=full_refresh}
2021-03-24 06:05:34 INFO (/tmp/workspace/58/0) DefaultAirbyteDestination(start):67 - Running target...
2021-03-24 06:05:34 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - Checking if airbyte/destination-bigquery:0.2.0 exists...
2021-03-24 06:05:35 DEBUG (/tmp/workspace/58/0) DockerProcessBuilderFactory(create):104 - Preparing command: docker run --rm -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/58/0 --network host airbyte/destination-bigquery:0.2.0 write --config destination_config.json --catalog destination_catalog.json
2021-03-24 06:05:35 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - airbyte/destination-bigquery:0.2.0 was found locally.
2021-03-24 06:05:35 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - Checking if airbyte/source-instagram:0.1.1 exists...
2021-03-24 06:05:35 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - airbyte/source-instagram:0.1.1 was found locally.
2021-03-24 06:05:35 DEBUG (/tmp/workspace/58/0) DockerProcessBuilderFactory(create):104 - Preparing command: docker run --rm -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/58/0 --network host airbyte/source-instagram:0.1.1 read --config source_config.json --catalog source_catalog.json --state input_state.json
2021-03-24 06:05:37 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:05:37 INFO i.a.i.d.b.BigQueryDestination(main):390 - {} - starting destination: class io.airbyte.integrations.destination.bigquery.BigQueryDestination
2021-03-24 06:05:37 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:05:37 INFO i.a.i.b.IntegrationRunner(run):78 - {} - Running integration: io.airbyte.integrations.destination.bigquery.BigQueryDestination
2021-03-24 06:05:37 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:05:37 INFO i.a.i.b.IntegrationCliParser(parseOptions):135 - {} - integration args: {catalog=destination_catalog.json, write=null, config=destination_config.json}
2021-03-24 06:05:37 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:05:37 INFO i.a.i.b.IntegrationRunner(run):82 - {} - Command: WRITE
2021-03-24 06:05:37 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:05:37 INFO i.a.i.b.IntegrationRunner(run):83 - {} - Integration config: IntegrationConfig{command=WRITE, configPath='destination_config.json', catalogPath='destination_catalog.json', statePath='null'}
2021-03-24 06:05:39 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:05:39 INFO i.a.i.d.b.BigQueryDestination(createTable):264 - {} - Table created successfully
2021-03-24 06:05:39 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:05:39 INFO i.a.i.d.b.BigQueryDestination(createTable):264 - {} - Table created successfully
2021-03-24 06:05:40 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:05:40 INFO i.a.i.d.b.BigQueryDestination(createTable):264 - {} - Table created successfully
2021-03-24 06:05:40 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:05:40 INFO i.a.i.d.b.BigQueryDestination(createTable):264 - {} - Table created successfully
2021-03-24 06:05:40 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:05:40 INFO i.a.i.d.b.BigQueryDestination(createTable):264 - {} - Table created successfully
2021-03-24 06:05:41 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:05:41 INFO i.a.i.d.b.BigQueryDestination(createTable):264 - {} - Table created successfully
2021-03-24 06:05:41 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:05:41 INFO i.a.i.d.b.BigQueryDestination(createTable):264 - {} - Table created successfully
2021-03-24 06:05:51 INFO (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):110 - Starting syncing SourceInstagram
2021-03-24 06:05:51 INFO (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):110 - Syncing media stream
2021-03-24 06:06:16 ERROR (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):108 - Encountered an exception while reading stream SourceInstagram
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/base_python/source.py", line 88, in read
    yield from self._read_stream(logger=logger, client=client, configured_stream=configured_stream, state=total_state)
  File "/usr/local/lib/python3.7/site-packages/base_python/source.py", line 106, in _read_stream
    for record in client.read_stream(configured_stream.stream):
  File "/usr/local/lib/python3.7/site-packages/base_python/client.py", line 166, in read_stream
    for message in method(fields=fields):
  File "/usr/local/lib/python3.7/site-packages/source_instagram/client/api.py", line 261, in list
    yield clear_video_url(record_data)
  File "/usr/local/lib/python3.7/site-packages/source_instagram/client/api.py", line 46, in clear_video_url
    end_of_string = record_data["media_url"].find("&_nc_rid=")
KeyError: 'media_url'

2021-03-24 06:06:16 INFO (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):110 - Syncing media_insights stream
2021-03-24 06:06:37 ERROR (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):108 - Insights error: {'error': {'message': 'Invalid parameter', 'type': 'OAuthException', 'code': 100, 'error_data': {'blame_field_specs': [['']]}, 'error_subcode': 2108006, 'is_transient': False, 'error_user_title': 'Media posted before business account conversion', 'error_user_msg': "The media was posted before the most recent time that the user's account was converted to a business account from a personal account.", 'fbtrace_id': 'AsEo4TGbJgj894OcT7XaCFL'}}
2021-03-24 06:06:37 ERROR (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):108 - Encountered an exception while reading stream SourceInstagram
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/base_python/source.py", line 88, in read
    yield from self._read_stream(logger=logger, client=client, configured_stream=configured_stream, state=total_state)
  File "/usr/local/lib/python3.7/site-packages/base_python/source.py", line 106, in _read_stream
    for record in client.read_stream(configured_stream.stream):
  File "/usr/local/lib/python3.7/site-packages/base_python/client.py", line 166, in read_stream
    for message in method(fields=fields):
  File "/usr/local/lib/python3.7/site-packages/source_instagram/client/api.py", line 315, in list
    **{record.get("name"): record.get("values")[0]["value"] for record in self._get_insights(ig_media)},
  File "/usr/local/lib/python3.7/site-packages/backoff/_sync.py", line 94, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/source_instagram/client/api.py", line 337, in _get_insights
    raise error
  File "/usr/local/lib/python3.7/site-packages/source_instagram/client/api.py", line 334, in _get_insights
    return item.get_insights(params={"metric": metrics})
  File "/usr/local/lib/python3.7/site-packages/facebook_business/adobjects/igmedia.py", line 249, in get_insights
    return request.execute()
  File "/usr/local/lib/python3.7/site-packages/facebook_business/api.py", line 677, in execute
    cursor.load_next_page()
  File "/usr/local/lib/python3.7/site-packages/facebook_business/api.py", line 844, in load_next_page
    params=self.params,
  File "/usr/local/lib/python3.7/site-packages/facebook_business/api.py", line 350, in call
    raise fb_response.error()
facebook_business.exceptions.FacebookRequestError: 

  Message: Call was not successful
  Method:  GET
  Path:    https://graph.facebook.com/v10.0/17948370028410575/insights
  Params:  {'metric': '["carousel_album_engagement","carousel_album_impressions","carousel_album_reach","carousel_album_saved"]'}

  Status:  400
  Response:
    {
      "error": {
        "message": "Invalid parameter",
        "type": "OAuthException",
        "code": 100,
        "error_data": {
          "blame_field_specs": [
            [
              ""
            ]
          ]
        },
        "error_subcode": 2108006,
        "is_transient": false,
        "error_user_title": "Media posted before business account conversion",
        "error_user_msg": "The media was posted before the most recent time that the user's account was converted to a business account from a personal account.",
        "fbtrace_id": "AsEo4TGbJgj894OcT7XaCFL"
      }
    }

2021-03-24 06:06:37 INFO (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):110 - Syncing stories stream
2021-03-24 06:06:48 ERROR (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):108 - Encountered an exception while reading stream SourceInstagram
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/base_python/source.py", line 88, in read
    yield from self._read_stream(logger=logger, client=client, configured_stream=configured_stream, state=total_state)
  File "/usr/local/lib/python3.7/site-packages/base_python/source.py", line 106, in _read_stream
    for record in client.read_stream(configured_stream.stream):
  File "/usr/local/lib/python3.7/site-packages/base_python/client.py", line 166, in read_stream
    for message in method(fields=fields):
  File "/usr/local/lib/python3.7/site-packages/source_instagram/client/api.py", line 290, in list
    yield clear_video_url(record_data)
  File "/usr/local/lib/python3.7/site-packages/source_instagram/client/api.py", line 46, in clear_video_url
    end_of_string = record_data["media_url"].find("&_nc_rid=")
KeyError: 'media_url'

2021-03-24 06:06:48 INFO (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):110 - Syncing story_insights stream
2021-03-24 06:07:27 ERROR (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):108 - Insights error: (#10) Not enough viewers for the media to show insights
2021-03-24 06:08:43 INFO (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):110 - Set state of user_insights stream to {'17841400192320675': '2021-03-23T07:00:00+00:00', '17841400284745138': '2021-03-23T07:00:00+00:00', '17841400346850774': '2021-03-23T07:00:00+00:00', '17841400489293275': '2021-03-23T07:00:00+00:00', '17841401318740840': '2021-03-23T07:00:00+00:00', '17841401446234888': '2021-03-23T07:00:00+00:00', '17841401781548125': '2021-03-23T07:00:00+00:00', '17841401980959720': '2021-03-23T07:00:00+00:00', '17841402916966028': '2021-03-23T07:00:00+00:00', '17841402958308126': '2021-03-23T07:00:00+00:00', '17841403029079681': '2021-03-23T07:00:00+00:00', '17841403071217400': '2021-03-23T07:00:00+00:00', '17841403177990120': '2021-03-23T07:00:00+00:00', '17841403348750148': '2021-03-23T07:00:00+00:00', '17841403497459345': '2021-03-23T07:00:00+00:00', '17841403573287506': '2021-03-23T07:00:00+00:00', '17841403884008942': '2021-03-23T07:00:00+00:00', '17841404108797506': '2021-03-23T07:00:00+00:00', '17841404135689658': '2021-03-23T07:00:00+00:00', '17841404181558895': '2021-03-23T07:00:00+00:00', '17841404189378325': '2021-03-23T07:00:00+00:00', '17841404191176661': '2021-03-23T07:00:00+00:00', '17841404204706856': '2021-03-23T07:00:00+00:00', '17841404206518515': '2021-03-23T07:00:00+00:00', '17841404217634823': '2021-03-23T07:00:00+00:00', '17841404222658404': '2021-03-23T07:00:00+00:00', '17841404247169359': '2021-03-23T07:00:00+00:00', '17841404249556597': '2021-03-23T07:00:00+00:00', '17841404277273118': '2021-03-23T07:00:00+00:00', '17841404289348916': '2021-03-23T07:00:00+00:00', '17841404291224111': '2021-03-23T07:00:00+00:00', '17841404295498412': '2021-03-23T07:00:00+00:00', '17841404309648630': '2021-03-23T07:00:00+00:00', '17841404414595147': '2021-03-23T07:00:00+00:00', '17841404414625698': '2021-03-23T07:00:00+00:00', '17841404417265499': '2021-03-23T07:00:00+00:00', '17841404479139399': '2021-03-23T07:00:00+00:00', '17841404515155637': '2021-03-23T07:00:00+00:00', '17841404526875296': '2021-03-23T07:00:00+00:00', '17841404550164361': '2021-03-23T07:00:00+00:00', '17841405395017198': '2021-03-23T07:00:00+00:00', '17841406062416044': '2021-03-23T07:00:00+00:00', '17841407013699980': '2021-03-23T07:00:00+00:00', '17841407037879823': '2021-03-23T07:00:00+00:00', '17841407132139874': '2021-03-23T07:00:00+00:00', '17841407156466071': '2021-03-23T07:00:00+00:00', '17841407200701023': '2021-03-23T07:00:00+00:00', '17841407218221212': '2021-03-23T07:00:00+00:00', '17841407224931898': '2021-03-23T07:00:00+00:00', '17841407336201984': '2021-03-23T07:00:00+00:00', '17841407357301285': '2021-03-23T07:00:00+00:00', '17841407382381164': '2021-03-23T07:00:00+00:00', '17841407382430084': '2021-03-23T07:00:00+00:00', '17841407401991982': '2021-03-23T07:00:00+00:00', '17841407437152137': '2021-03-23T07:00:00+00:00', '17841407477879305': '2021-03-23T07:00:00+00:00', '17841407479889231': '2021-03-23T07:00:00+00:00', '17841407517216804': '2021-03-23T07:00:00+00:00', '17841407535576014': '2021-03-23T07:00:00+00:00', '17841407542049081': '2021-03-23T07:00:00+00:00', '17841407606405883': '2021-03-23T07:00:00+00:00', '17841407622209315': '2021-03-23T07:00:00+00:00', '17841407695930820': '2021-03-23T07:00:00+00:00', '17841407758391657': '2021-03-23T07:00:00+00:00', '17841407761931817': '2021-03-23T07:00:00+00:00', '17841407783327203': '2021-03-23T07:00:00+00:00', '17841407802011612': '2021-03-23T07:00:00+00:00', '17841407804681480': '2021-03-23T07:00:00+00:00', '17841407875481752': '2021-03-23T07:00:00+00:00', '17841408081871487': '2021-03-23T07:00:00+00:00', '17841408181124483': '2021-03-23T07:00:00+00:00', '17841408542071957': '2021-03-23T07:00:00+00:00', '17841413777302998': '2021-03-23T07:00:00+00:00', '17841413976319144': '2021-03-23T07:00:00+00:00', '17841426779704172': '2021-03-23T07:00:00+00:00', '17841430918712351': '2021-03-23T07:00:00+00:00', '17841431142927045': '2021-03-23T07:00:00+00:00', '17841431308458310': '2021-03-23T07:00:00+00:00', '17841431327763937': '2021-03-23T07:00:00+00:00', '17841431374667263': '2021-03-23T07:00:00+00:00', '17841431398606029': '2021-03-23T07:00:00+00:00'}
2021-03-24 06:08:43 INFO (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):110 - Syncing user_insights stream
2021-03-24 06:08:44 INFO (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):110 - Syncing user_lifetime_insights stream
2021-03-24 06:08:58 INFO (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):110 - Syncing users stream
2021-03-24 06:09:07 ERROR (/tmp/workspace/58/0) LineGobbler(voidCall):69 - /usr/local/lib/python3.7/site-packages/facebook_business/utils/api_utils.py:30: UserWarning: media does not allow field children
2021-03-24 06:09:07 ERROR (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   warnings.warn(message)
2021-03-24 06:09:07 ERROR (/tmp/workspace/58/0) LineGobbler(voidCall):69 - /usr/local/lib/python3.7/site-packages/facebook_business/utils/api_utils.py:30: UserWarning: value of metric might not be compatible.  Expect list<metric_enum>;  got <class 'list'>
2021-03-24 06:09:07 ERROR (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   warnings.warn(message)
2021-03-24 06:09:07 ERROR (/tmp/workspace/58/0) LineGobbler(voidCall):69 - /usr/local/lib/python3.7/site-packages/facebook_business/utils/api_utils.py:30: UserWarning: value of period might not be compatible.  Expect list<period_enum>;  got <class 'str'>
2021-03-24 06:09:07 ERROR (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   warnings.warn(message)
2021-03-24 06:09:07 INFO (/tmp/workspace/58/0) DefaultAirbyteStreamFactory(internalLog):110 - Finished syncing SourceInstagram
2021-03-24 06:09:07 DEBUG (/tmp/workspace/58/0) DefaultAirbyteSource(close):109 - Closing tap process
2021-03-24 06:09:07 DEBUG (/tmp/workspace/58/0) DefaultAirbyteDestination(close):105 - Closing target process
2021-03-24 06:09:07 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:09:07 INFO i.a.i.b.FailureTrackingConsumer(close):64 - {} - hasFailed: false.
2021-03-24 06:09:11 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:09:11 ERROR i.a.i.d.b.BigQueryDestination$RecordConsumer(close):344 - {} - executing on success close procedure.
2021-03-24 06:09:23 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:09:23 INFO i.a.i.b.IntegrationRunner(run):120 - {} - Completed integration: io.airbyte.integrations.destination.bigquery.BigQueryDestination
2021-03-24 06:09:23 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 2021-03-24 06:09:23 INFO i.a.i.d.b.BigQueryDestination(main):392 - {} - completed destination: class io.airbyte.integrations.destination.bigquery.BigQueryDestination
2021-03-24 06:09:23 INFO (/tmp/workspace/58/0) DefaultSyncWorker(run):113 - Running normalization.
2021-03-24 06:09:23 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - Checking if airbyte/normalization:0.1.15 exists...
2021-03-24 06:09:23 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - airbyte/normalization:0.1.15 was found locally.
2021-03-24 06:09:23 DEBUG (/tmp/workspace/58/0) DockerProcessBuilderFactory(create):104 - Preparing command: docker run --rm -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/58/0/normalize --network host airbyte/normalization:0.1.15 run --integration-type bigquery --config destination_config.json --catalog destination_catalog.json
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - Namespace(config='destination_config.json', integration_type=<DestinationType.bigquery: 'bigquery'>, out='/data/58/0/normalize')
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - transform_bigquery
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - Processing destination_catalog.json...
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_ab1.sql from media
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_ab2.sql from media
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_ab3.sql from media
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_tables/airbyte_tripscout_instagram/media.sql from media
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_insights_ab1.sql from media_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_insights_ab2.sql from media_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_insights_ab3.sql from media_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_tables/airbyte_tripscout_instagram/media_insights.sql from media_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/stories_ab1.sql from stories
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/stories_ab2.sql from stories
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/stories_ab3.sql from stories
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_tables/airbyte_tripscout_instagram/stories.sql from stories
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/story_insights_ab1.sql from story_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/story_insights_ab2.sql from story_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/story_insights_ab3.sql from story_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_tables/airbyte_tripscout_instagram/story_insights.sql from story_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/user_insights_ab1.sql from user_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/user_insights_ab2.sql from user_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/user_insights_ab3.sql from user_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_tables/airbyte_tripscout_instagram/user_insights.sql from user_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/user_lifetime_insights_ab1.sql from user_lifetime_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/user_lifetime_insights_ab2.sql from user_lifetime_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/user_lifetime_insights_ab3.sql from user_lifetime_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_tables/airbyte_tripscout_instagram/user_lifetime_insights.sql from user_lifetime_insights
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/users_ab1.sql from users
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/users_ab2.sql from users
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/users_ab3.sql from users
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_tables/airbyte_tripscout_instagram/users.sql from users
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_0f8_owner_ab1.sql from media/owner
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_0f8_owner_ab2.sql from media/owner
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_0f8_owner_ab3.sql from media/owner
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_tables/airbyte_tripscout_instagram/media_0f8_owner.sql from media/owner
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_d39_children_ab1.sql from media/children
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_d39_children_ab2.sql from media/children
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_d39_children_ab3.sql from media/children
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_tables/airbyte_tripscout_instagram/media_d39_children.sql from media/children
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/stories_ce2_owner_ab1.sql from stories/owner
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/stories_ce2_owner_ab2.sql from stories/owner
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/stories_ce2_owner_ab3.sql from stories/owner
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_tables/airbyte_tripscout_instagram/stories_ce2_owner.sql from stories/owner
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Ignoring substream 'value' from user_lifetime_insights/value because properties list is empty
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_children_4f7_owner_ab1.sql from media/children/owner
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_children_4f7_owner_ab2.sql from media/children/owner
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_views/airbyte_tripscout_instagram/media_children_4f7_owner_ab3.sql from media/children/owner
2021-03-24 06:09:24 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 -   Generating airbyte_tables/airbyte_tripscout_instagram/media_children_4f7_owner.sql from media/children/owner
2021-03-24 06:09:26 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - Running with dbt=0.18.1
2021-03-24 06:09:29 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - Found 44 models, 0 tests, 0 snapshots, 0 analyses, 341 macros, 0 operations, 0 seed files, 7 sources
2021-03-24 06:09:29 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 
2021-03-24 06:09:29 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:29 | Concurrency: 32 threads (target='prod')
2021-03-24 06:09:29 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:29 | 
2021-03-24 06:09:30 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:30 | 1 of 11 START table model airbyte_tripscout_instagram.media_insights......................................... [RUN]
2021-03-24 06:09:30 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:30 | 2 of 11 START table model airbyte_tripscout_instagram.user_insights.......................................... [RUN]
2021-03-24 06:09:30 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:30 | 3 of 11 START table model airbyte_tripscout_instagram.user_lifetime_insights................................. [RUN]
2021-03-24 06:09:30 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:30 | 4 of 11 START table model airbyte_tripscout_instagram.story_insights......................................... [RUN]
2021-03-24 06:09:30 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:30 | 5 of 11 START table model airbyte_tripscout_instagram.stories................................................ [RUN]
2021-03-24 06:09:30 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:30 | 7 of 11 START table model airbyte_tripscout_instagram.users.................................................. [RUN]
2021-03-24 06:09:30 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:30 | 6 of 11 START table model airbyte_tripscout_instagram.media.................................................. [RUN]
2021-03-24 06:09:32 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:32 | 7 of 11 OK created table model airbyte_tripscout_instagram.users............................................. [CREATE TABLE (81.0 rows, 47.8 KB processed) in 1.87s]
2021-03-24 06:09:32 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:32 | 1 of 11 OK created table model airbyte_tripscout_instagram.media_insights.................................... [CREATE TABLE (120.0 rows, 20.1 KB processed) in 2.28s]
2021-03-24 06:09:32 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:32 | 2 of 11 OK created table model airbyte_tripscout_instagram.user_insights..................................... [CREATE TABLE (2.8k rows, 1.6 MB processed) in 2.13s]
2021-03-24 06:09:32 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:32 | 4 of 11 OK created table model airbyte_tripscout_instagram.story_insights.................................... [CREATE TABLE (191.0 rows, 36.3 KB processed) in 2.10s]
2021-03-24 06:09:32 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:32 | 3 of 11 OK created table model airbyte_tripscout_instagram.user_lifetime_insights............................ [CREATE TABLE (320.0 rows, 242.9 KB processed) in 2.13s]
2021-03-24 06:09:32 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:32 | 6 of 11 OK created table model airbyte_tripscout_instagram.media............................................. [CREATE TABLE (356.0 rows, 400.4 KB processed) in 2.14s]
2021-03-24 06:09:32 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:32 | 8 of 11 START table model airbyte_tripscout_instagram.media_0f8_owner........................................ [RUN]
2021-03-24 06:09:33 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:33 | 9 of 11 START table model airbyte_tripscout_instagram.media_d39_children..................................... [RUN]
2021-03-24 06:09:33 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:33 | 5 of 11 OK created table model airbyte_tripscout_instagram.stories........................................... [CREATE TABLE (22.0 rows, 19.6 KB processed) in 2.75s]
2021-03-24 06:09:33 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:33 | 10 of 11 START table model airbyte_tripscout_instagram.stories_ce2_owner..................................... [RUN]
2021-03-24 06:09:34 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:34 | 8 of 11 OK created table model airbyte_tripscout_instagram.media_0f8_owner................................... [CREATE TABLE (356.0 rows, 24.3 KB processed) in 1.63s]
2021-03-24 06:09:34 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:34 | 10 of 11 OK created table model airbyte_tripscout_instagram.stories_ce2_owner................................ [CREATE TABLE (22.0 rows, 1.5 KB processed) in 1.27s]
2021-03-24 06:09:35 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:35 | 9 of 11 OK created table model airbyte_tripscout_instagram.media_d39_children................................ [CREATE TABLE (147.0 rows, 92.0 KB processed) in 2.19s]
2021-03-24 06:09:35 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:35 | 11 of 11 START table model airbyte_tripscout_instagram.media_children_4f7_owner.............................. [RUN]
2021-03-24 06:09:36 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:36 | 11 of 11 OK created table model airbyte_tripscout_instagram.media_children_4f7_owner......................... [CREATE TABLE (147.0 rows, 10.0 KB processed) in 1.49s]
2021-03-24 06:09:36 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:36 | 
2021-03-24 06:09:36 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 06:09:36 | Finished running 11 table models in 7.55s.
2021-03-24 06:09:36 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 
2021-03-24 06:09:36 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - Completed successfully
2021-03-24 06:09:36 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - 
2021-03-24 06:09:36 INFO (/tmp/workspace/58/0) LineGobbler(voidCall):69 - Done. PASS=11 WARN=0 ERROR=0 SKIP=0 TOTAL=11
2021-03-24 06:09:37 DEBUG (/tmp/workspace/58/0) DefaultNormalizationRunner(close):97 - Closing tap process
2021-03-24 06:09:37 INFO (/tmp/workspace/58/0) DefaultSyncWorker(run):130 - sync summary: io.airbyte.config.StandardSyncSummary@60ecdbb3[status=completed,recordsSynced=1090,bytesSynced=774574,startTime=1616565934923,endTime=1616566177262]
sherifnada commented 3 years ago

@jimbeepbeep we did some digging and found two issues:

  1. There is a bug that occasionally creeps up when pulling data for the Media Insights stream. This is a bug that we need to fix on our side, a fix is currently WIP.
  2. Instagram doesn't allow pulling Media assets that were uploaded before the instagram account was turned into a business account (it throws an error when you try to do this). There is a bug in the connector where we stop syncing the whole stream, instead of attempting to sync all remaining media assets (the ones which were uploaded after the account became a business account).

@yevhenii-ldv is working on a fix for both issues and we'll have one for you soon!

jim-barlow commented 3 years ago

Thanks team!

yevhenii-ldv commented 3 years ago

@jimbeepbeep

We just merged this bugfix into master a released a new version of the connector.

Please, upgrade your Instagram connector to version 0.1.2 pick up the bugfix. To upgrade your connector version, go to the admin panel in the left hand side of the UI, find this connector in the list, and input the latest connector version.

Please let us know if you have any further questions.

Enjoy!

jim-barlow commented 3 years ago

Thanks @yevhenii-ldv I've upgraded the connector and re-run the sync, it looks like it's fixed the media_insights issue, which is awesome. However there are still only 3/81 accounts coming through for media:

image

Logs attached, let me know if you need anything else:

logs-80-0.txt

sherifnada commented 3 years ago

@jimbeepbeep could you try manually triggering a sync? I'm seeing this error in the logs

  Status:  500
  Response:
    {
      "error": {
        "message": "An unexpected error has occurred. Please retry your request later.",
        "type": "OAuthException",
        "is_transient": true,
        "code": 2,
        "fbtrace_id": "AJ3EQr2jeWlBOTQFNzQWhUk"
      }
    }

this was an internal error in the FB server causing the connector to fail. Normally, the connector should retry the sync, but we discovered a related open bug yesterday: https://github.com/airbytehq/airbyte/issues/2616 . Rerunning the sync should show you these media assets.

jim-barlow commented 3 years ago

Thanks @sherifnada I have just re-run the sync and it seems to be picking up media from more accounts (7/82) but not all of them: image

The sync now takes over 4 hours and most of the time seems to be handling these Media posted before business account conversion errors. Logs attached (access token redacted):

logs-87-0.txt

Let me know if you need anything else.

sherifnada commented 3 years ago

@jimbeepbeep we've identified this issue and will address in #2626 -- the issue is that the FB server has transient 500 failures that we should just back off on and retry. Will ping you here when that issue is addressed.

jim-barlow commented 3 years ago

Thanks, as always @sherifnada!

yevhenii-ldv commented 3 years ago

@jimbeepbeep

We just merged the bugfix of Issue #2626 into master a released a new version of the connector.

Please, upgrade your Instagram connector to version 0.1.3 pick up the bugfix. To upgrade your connector version, go to the admin panel in the left hand side of the UI, find this connector in the list, and input the latest connector version.

I hope this bugfix will fix the remaining problems with the Instagram Connector. Please let us know if you have any further questions.

Enjoy!

jim-barlow commented 3 years ago

Great stuff @yevhenii-ldv and @sherifnada, I've re-run the sync and it completed fine... image You guys are amazing, thanks!

jim-barlow commented 3 years ago

Hey guys, not sure where the best place to raise issues... let me know if I should use Slack instead of here. I'm just looking through historic logs and there are a few fails, plus the sync from last night (6:13PM UTC) is still running after 2 fails, about 17 hours later:

Normally this completes as per yesterday morning: Succeeded 157.25 MB | 201,643 records | 6h 55m 16s | Sync

Logs below (access token redacted). I have checked the Facebook App dashboard and we're nowhere near app rate limits so it's definitely not that.

Attempt 1: logs-52-0.txt Attempt 2: logs-52-1.txt Attempt 3: logs-52-2.txt

I'm running airbyte/source-instagram 0.1.4 and Airbyte 0.19.0-alpha, I haven't updated in a while as this data is mission critical for a client and the process looks complicated enough that I'm worried I'll interrupt a sync or not be able to get it back up and running. Is there a simpler approach (or a script I can run) rather than this process in the docs?

jim-barlow commented 3 years ago

@yevhenii-ldv / @sherifnada this connector is now failing, screenshot below, Let me know if you need any more logs in addition to the above...

image

yevhenii-ldv commented 3 years ago

Hello @jimbeepbeep. Thank you for your comment and for raising the issue. We'll investigate this error and your logs and find the cause of the problem.

Will ping you here when any new information on this issue becomes known.

sherifnada commented 3 years ago

@jimbeepbeep we've created an issue to track this here: https://github.com/airbytehq/airbyte/issues/3241

FWIW it's easier to create new issues for the instagram connector going forward

jim-barlow commented 3 years ago

Awesome thanks