I am using PyAirbyte for executing migration from the Amazon API to BigQuery. Here is the code I am running:
`def fetch_pyairbyte_data(aws_environment, region, account_type, app_id, client_secret, refresh_token, project_name, dataset_name, stream):
import airbyte as ab
from airbyte.caches.bigquery import BigQueryCache
During the run I can see that I am able to fetch some data from source but there is an error in JSON parsing:
[2024-09-13, 04:37:48 UTC] {process_utils.py:190} INFO - google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.; reason: invalid, message: Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.; reason: invalid, message: Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 1; errors: 1; max bad: 0; error percent: 0; reason: invalid, message: Error while reading data, error message: JSON parsing error in row starting at position 0: Couldn't convert value to timestamp: Could not parse '2024-09-09' as a timestamp. Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]] or YYYY/MM/DD HH:MM[:SS[.SSSSSS]] Field: startdate; Value: 2024-09-09
I thought that perhaps there is an error in the Airbyte source connector in the schema. Instead of date-time format there should be date and I am willing to make a fix for that, but wanted to have confidence that would fix the issue.
I ran the same pipeline locally on Docker deployment of Airbyte and the migration is successful. So the issue as i see it is on Airbyte, but is visible only if you use PyAirbyte.
The 4.4.1 version of Amazon Sales Partner connector is being used. The same source connector version was used in Docker Airbyte migration and it succeeded in writing the data to BigQuery.
Connector
source-amazon-seller-partner
Issue
I am using PyAirbyte for executing migration from the Amazon API to BigQuery. Here is the code I am running: `def fetch_pyairbyte_data(aws_environment, region, account_type, app_id, client_secret, refresh_token, project_name, dataset_name, stream): import airbyte as ab from airbyte.caches.bigquery import BigQueryCache
During the run I can see that I am able to fetch some data from source but there is an error in JSON parsing:
[2024-09-13, 04:37:48 UTC] {process_utils.py:190} INFO - google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.; reason: invalid, message: Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.; reason: invalid, message: Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 1; errors: 1; max bad: 0; error percent: 0; reason: invalid, message: Error while reading data, error message: JSON parsing error in row starting at position 0: Couldn't convert value to timestamp: Could not parse '2024-09-09' as a timestamp. Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]] or YYYY/MM/DD HH:MM[:SS[.SSSSSS]] Field: startdate; Value: 2024-09-09
I thought that perhaps there is an error in the Airbyte source connector in the schema. Instead of
date-time
format there should bedate
and I am willing to make a fix for that, but wanted to have confidence that would fix the issue.I ran the same pipeline locally on Docker deployment of Airbyte and the migration is successful. So the issue as i see it is on Airbyte, but is visible only if you use PyAirbyte.