airbytehq / PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.
https://docs.airbyte.com/pyairbyte
Other
231 stars 41 forks source link

Source Amazon Seller Partner: Connector Failed Error due to JSON Schema #362

Open andrzejdackiewicz opened 2 months ago

andrzejdackiewicz commented 2 months ago

Connector


source-amazon-seller-partner

Issue


I am using PyAirbyte for executing migration from the Amazon API to BigQuery. Here is the code I am running: `def fetch_pyairbyte_data(aws_environment, region, account_type, app_id, client_secret, refresh_token, project_name, dataset_name, stream): import airbyte as ab from airbyte.caches.bigquery import BigQueryCache

source = ab.get_source(
    "source-amazon-seller-partner",
    config=
    {
        "aws_environment": aws_environment,
        "region": region,
        "account_type": account_type,
        "lwa_app_id": app_id,
        "lwa_client_secret": client_secret,
        "refresh_token": refresh_token,
        "replication_start_date": "2024-09-08T00:00:00Z",
        "report_options_list":
        [
          {
            "report_name": "GET_VENDOR_SALES_REPORT",
            "stream_name": "GET_VENDOR_SALES_REPORT",
            "options_list": [
              {
                "option_name": "reportPeriod",
                "option_value": "DAY"
              },
              {
                "option_name": "distributorView",
                "option_value": "SOURCING"
              },
              {
                "option_name": "sellingProgram",
                "option_value": "RETAIL"
              }
            ]
          },
        ]
    },
    install_if_missing=True,
)

cache = BigQueryCache(
    project_name=project_name,
    dataset_name=dataset_name,
)

source.select_streams(stream)

result = source.read(cache=cache)`

During the run I can see that I am able to fetch some data from source but there is an error in JSON parsing:

[2024-09-13, 04:37:48 UTC] {process_utils.py:190} INFO - google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.; reason: invalid, message: Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.; reason: invalid, message: Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 1; errors: 1; max bad: 0; error percent: 0; reason: invalid, message: Error while reading data, error message: JSON parsing error in row starting at position 0: Couldn't convert value to timestamp: Could not parse '2024-09-09' as a timestamp. Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]] or YYYY/MM/DD HH:MM[:SS[.SSSSSS]] Field: startdate; Value: 2024-09-09

I thought that perhaps there is an error in the Airbyte source connector in the schema. Instead of date-time format there should be date and I am willing to make a fix for that, but wanted to have confidence that would fix the issue. image

I ran the same pipeline locally on Docker deployment of Airbyte and the migration is successful. So the issue as i see it is on Airbyte, but is visible only if you use PyAirbyte.

andrzejdackiewicz commented 2 months ago

The 4.4.1 version of Amazon Sales Partner connector is being used. The same source connector version was used in Docker Airbyte migration and it succeeded in writing the data to BigQuery.