airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.4k stars 3.97k forks source link

Destination BigQuery: DynamoDB -> BigQuery: Only top-level primary keys are supported #30250

Open Heedster opened 1 year ago

Heedster commented 1 year ago

Connector Name

destination-bigquery

Connector Version

2.0.2

What step the error happened?

During the sync

Revelant information

When I sync from my DynamoDB table to BigQuery, I get this error: Stack Trace: java.lang.IllegalArgumentException: Only top-level primary keys are supported

Further changes and "Reset" also has the same error.

One suspicious thing I have is the way primary keys of dynamodb are captured and shown here in the UI.

Screenshot 2023-09-07 at 5 28 10 PM

My primary key is actually "id". But the UI shows: batchId.createdAt.createdBy.createdOn.id.updatedAt.updatedOn I am unable to edit it. When i try to the primary column edit cursor is disabled.

batchId, createdAt, createdBy, createdOn, id, updatedAt, updatedOn are all primary top level attributes, and are hash keys or range keys of my various indexes in the table. DynamoDB anyway does not also allow to create non top level things as primary keys. id is the hash key of the table itself. I dont know why the source connector combines it this way.

Update

This could just be a UI thing.

When I make an API call, it shows up as a list properly, but as a list of a list (maybe thats the problem?)

{
    "data": [
        {
            "connectionId": "fefcff54-2bca-4839-9eb2-c03d456dc1fa",
            "name": "DynamoDB → BigQuery",
            "sourceId": "905b96d5-216a-409c-8c22-0da6f6195361",
            "destinationId": "749cd4c5-9d3f-4824-9a31-f96a848e90cf",
            "workspaceId": "50665ff8-21c6-4586-a791-406a74c3b9d0",
            "status": "active",
            "schedule": {
                "scheduleType": "basic",
                "basicTiming": "Every 24 hours"
            },
            "dataResidency": "auto",
            "nonBreakingSchemaUpdatesBehavior": "ignore",
            "namespaceDefinition": "destination",
            "namespaceFormat": "${SOURCE_NAMESPACE}",
            "configurations": {
                "streams": [
                    {
                        "name": "REDACTED",
                        "syncMode": "incremental_deduped_history",
                        "cursorField": [
                            "updatedOn"
                        ],
                        "primaryKey": [
                            [
                                "batchId",
                                "createdAt",
                                "createdBy",
                                "createdOn",
                                "id",
                                "updatedAt",
                                "updatedOn"
                            ]
                        ]
                    }
                ]
            }
        }
    ]
}

Log file

Relevant log output

No response

Contribute

MikeAtJulaya commented 6 months ago

Hi team 👋🏼

We are stuck with this similar problem. Do you have any idea of a workaround before the fix?

thanks a lot 🙏🏼 cheers 🍻

stevegoulet commented 5 months ago

I am also seeing this error when using PostgreSQL as a destination. Do we need another task for that use case?

java.lang.IllegalArgumentException: Only top-level primary keys are supported

pdecarcer commented 5 months ago

Same here! It was working with Postgres before

ATP-rferraioli commented 3 months ago

Same issue :( java.lang.IllegalArgumentException: Only top-level primary keys are supported

lschinasi commented 2 months ago

same for me - except destination is snowflake.

lschinasi commented 2 months ago

Same here! It was working with Postgres before

Also worked before for me and suddenly stopped (destination is snowflake).

luaoliveira commented 2 months ago

I am having the same problem when connecting DynamoDB to Snowflake. At first it was working good and after testing a table with columns containing dictionary the connection stopped working.

AronsonDan commented 1 month ago

Would love a fix on that as well

AronsonDan commented 1 month ago

Would love a review on that PR :-)

https://github.com/airbytehq/airbyte/pull/42398