airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.29k stars 3.95k forks source link

[platform] error on streams with source defined cursors where the default cursor field is a nested property #36253

Open cjwooo opened 5 months ago

cjwooo commented 5 months ago

Platform Version

0.51.0, 0.53.1

What step the error happened?

Other

Revelant information

I have a custom source that contains a stream with source_defined_cursor = true and default_cursor_field = ['computedProperties', 'updatedAt']. After creating the source and destination via the Airbyte UI, the New connection page throws an error:

image

There is no error actually surfaced in the logs shown by the UI. But in the docker-compose logs, I see that Airbyte errored on validation of the source's discover output: Source defined cursor validation failed for stream: pipelines. Error: key: 'updatedAt' of path: '[computedProperties, updatedAt]' not found in schema: {...}.

The full message is in the "Relevant log output" section, and towards the end, you can see that there is an updatedAt field in the stream schema that is a nested field in the computedProperties object param.

This issue does not occur on our currently deployed Airbyte version 0.50.31. I was able to produce this issue with a Docker deployment on version 0.51.0 and 0.53.1.

Relevant log output

Caused by: io.temporal.failure.ApplicationFailure: message='Source defined cursor validation failed for stream: pipelines. Error: key: 'updatedAt' of path: '[computedProperties, updatedAt]' not found in schema: {"id":{"type":"string"},"errors":{"type":"array","items":{}},"project_slug":{"type":"string"},"updated_at":{"type":"string"},"number":{"type":"number"},"state":{"type":"string"},"created_at":{"type":"string"},"trigger":{"type":"object","properties":{"received_at":{"type":"string"},"type":{"type":"string"},"actor":{"type":"object","properties":{"login":{"type":"string"},"avatar_url":{"type":["string","null"]}}}}},"vcs":{"type":"object","properties":{"origin_repository_url":{"type":"string"},"target_repository_url":{"type":"string"},"revision":{"type":"string"},"provider_name":{"type":"string"},"branch":{"type":"string"}}},"workflows":{"type":"array","items":{"type":"object","properties":{"pipeline_id":{"type":"string"},"id":{"type":"string"},"name":{"type":"string"},"project_slug":{"type":"string"},"status":{"type":"string"},"started_by":{"type":"string"},"pipeline_number":{"type":"number"},"created_at":{"type":"string"},"stopped_at":{"type":"string"},"jobs":{"type":"array","items":{"type":"object","properties":{"dependencies":{"type":"array","items":{}},"job_number":{"type":"number"},"id":{"type":"string"},"started_at":{"type":"string"},"name":{"type":"string"},"project_slug":{"type":"string"},"status":{"type":"string"},"type":{"type":"string"},"stopped_at":{"type":"string"}}}}}}},"computedProperties":{"type":"object","properties":{"updatedAt":{"type":"string"}}}}', type='io.airbyte.workers.exception.WorkerException', nonRetryable=false
cjwooo commented 5 months ago

I found on an old thread on the Airbyte community Slack suggesting that nested cursor fields are not supported due to restrictions for supporting normalization. If this is true, the Airbyte Protocol documentation and CDK should be updated, because they both show nested fields as a valid option.

marcosmarxm commented 5 months ago

@natikgadzhi can you confirm this is an issue or responsibility of extensibility team?