airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.18k stars 4.14k forks source link

Destinations should support timestamp with millisecond precision #8904

Open tuliren opened 2 years ago

tuliren commented 2 years ago

Summary

Currently Postgres source returns timestamp with second precision. This causes problem for timestamp columns with millisecond precision when such column is used as the cursor in the incremental sync. The data persisted on the destination side has second precision, while the original data in the database has millisecond precision. Consequently, the timestamp in the original data is always newer than that synced to the destination because of the extra millisecond values. So the sync is always triggered even when there is is no new data.

Slack thread.

TODOs

┆Issue is synchronized with this Asana task by Unito

ameyabapat-bsft commented 2 years ago

Do we need to add snowflake-source to Update the following source databases to support millisecond precision timestamp category? I have found the similar issue https://github.com/airbytehq/airbyte/issues/9915

ameyabapat-bsft commented 2 years ago

@alafanechere @tuliren any updates on source snowflake (https://github.com/airbytehq/airbyte/issues/9915) ? This issues is magnified in our use cases where large data dump(10-100k rows) is added at source in single operation which make timestamp of lots of rows same and all them are resynced in in next sync operation.

grishick commented 2 years ago

Sources have been fixed. Converting this to destination-specific issue. Next step: create issues for each destination, which is capable of supporting millisecond precision

tuliren commented 2 years ago

@alexandr-shegeda, is GL working on the destination changes as well?