dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.65k stars 176 forks source link

Progress log percentage doubled by merge jobs in sql_database #2064

Open FridayPush opened 2 days ago

FridayPush commented 2 days ago

dlt version

dlt 1.3.0

Describe the problem

During a load operation the progress bar generated by logging grows larger than 100%. For example

-------------------- Load sql_database in 1731600286.779139 --------------------
Jobs: 8/18 (44.4%) | Time: 47.06s | Rate: 0.17/s

-------------------- Load sql_database in 1731600286.779139 --------------------
Jobs: 17/18 (94.4%) | Time: 68.30s | Rate: 0.25/s

-------------------- Load sql_database in 1731600286.779139 --------------------
Jobs: 24/18 (133.3%) | Time: 94.54s | Rate: 0.25/s

Expected behavior

The final part of the log above has 24/18 however that should be 12/18`.

Steps to reproduce

This was a use of sql_database to a SQL Server database with a destination of Redshift. Sample code below:

destination = dlt.destinations.redshift(
            staging_dataset_name_layout="z_staging",
            credentials=connection_string,
        )
pipeline = dlt.pipeline(pipeline_name="my_sync", destination=destination, dataset_name=dest_schema, progress=dlt.progress.log(
            log_period=20, log_level=logging.INFO, dump_system_stats=False
        ))

source_1 = sql_database(credentials, reflection_level="full_with_precision")
info = pipeline.run(source_1, write_disposition="merge")

Operating system

Linux, macOS

Runtime environment

Local

Python version

3.11

dlt data source

sql_database

dlt destination

Amazon Redshift

Other deployment details

No response

Additional information

No response