The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
The GCS destination does not properly apply the Overwrite strategy in certain circumstances.
I mean the existing files are not removed from GCS path before the sync, leading to duplicates files after the sync.
In GCS destination configuration settings, while filling GCS Bucket Path value:
with a simple path like my_folder/my_subfolder : Overwrite strategy works as expected
with a dynamic path like my_folder/${STREAM_NAME}: Overwrite strategy does not work as expected as explained above.
Reproducibility is really simple:
set up a GCS destination with simple path in GCS Bucket Path value
create a connection using this GCS destination (source does not matter) using Overwrite strategy
sync the connection => a file will be generated as expected
re-sync the connection => the file will be dropped first, then re-generated as expected
update the GCS destination GCS Bucket Path value with a dynamic path
re-sync the connection => the file will NOT be dropped first, then re-generated, leading to duplicates
Connector Name
destination-google-cloud-storage
Connector Version
0.4.4
What step the error happened?
During the sync
Revelant information
Hello,
The GCS destination does not properly apply the
Overwrite
strategy in certain circumstances. I mean the existing files are not removed from GCS path before the sync, leading to duplicates files after the sync.In GCS destination configuration settings, while filling
GCS Bucket Path
value:my_folder/my_subfolder
:Overwrite
strategy works as expectedmy_folder/${STREAM_NAME}
:Overwrite
strategy does not work as expected as explained above.Reproducibility is really simple:
GCS Bucket Path
valueOverwrite
strategyGCS Bucket Path
value with a dynamic pathThank you for your help 👍
Relevant log output
No response
Contribute