dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.38k stars 154 forks source link

Error when using refresh="drop_resources" and change the write_disposition from append to merge #1801

Open VioletM opened 2 weeks ago

VioletM commented 2 weeks ago

dlt version

0.9.9a1

Describe the problem

If you want to change the write disposition from append to merge, and you want to do it with refresh="drop_resources", you'll get an error because it's trying to remove staging tables that are not exist.

<class 'dlt.destinations.exceptions.DatabaseUndefinedRelation'>
Catalog Error: Table with name _dlt_version does not exist!
Did you mean "hospital_data_pipeline.hospital_data._dlt_version"?
LINE 1: DELETE FROM "hospital_data_staging"."_dlt_version" ...

Expected behavior

It will try to remove staging tables, but if there are no staging tables, it will ignore the error.

Steps to reproduce

stack trace

/Users/vmishechk/Library/Caches/pypoetry/virtualenvs/dlt-verified-sources-5afoIPXt-py3.9/bin/python /Users/vmishechk/PycharmProjects/dlt/bya/test.py 
2024-09-11 18:57:06,810|[WARNING]|1930|8079056576|dlt|logger.py|wrapper:25|The pipeline `run` method will now load the pending load packages. The data you passed to the run function will not be loaded. In order to do that you must run the pipeline again
2024-09-11 18:57:06,827|[WARNING]|1930|8079056576|dlt|utils.py|verify_schema_merge_disposition:118|Table encounters has `write_disposition` set to `merge` and `merge_strategy` set to `delete-insert`, but no primary or merge keys defined. dlt will fall back to `append` for this table.
Traceback (most recent call last):
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/destinations/sql_client.py", line 321, in _wrap_gen
    return (yield from f(self, *args, **kwargs))
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/destinations/impl/duckdb/sql_client.py", line 133, in execute_query
    raise outer
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/destinations/impl/duckdb/sql_client.py", line 128, in execute_query
    self._conn.execute(query, db_args)
duckdb.duckdb.CatalogException: Catalog Error: Table with name _dlt_version does not exist!
Did you mean "hospital_data_pipeline.hospital_data._dlt_version"?
LINE 1: DELETE FROM "hospital_data_staging"."_dlt_version" ...
                    ^

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/pipeline/pipeline.py", line 572, in load
    runner.run_pool(load_step.config, load_step)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/common/runners/pool_runner.py", line 89, in run_pool
    while _run_func():
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/common/runners/pool_runner.py", line 82, in _run_func
    run_metrics = run_f.run(cast(TExecutor, pool))
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/load/load.py", line 637, in run
    self.load_single_package(load_id, schema)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/load/load.py", line 526, in load_single_package
    applied_update = init_client(
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/load/utils.py", line 136, in init_client
    _init_dataset_and_update_schema(
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/load/utils.py", line 168, in _init_dataset_and_update_schema
    job_client.drop_tables(*drop_tables, delete_schema=True)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/destinations/job_client_impl.py", line 202, in drop_tables
    self._delete_schema_in_storage(self.schema)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/destinations/job_client_impl.py", line 639, in _delete_schema_in_storage
    self.sql_client.execute_sql(f"DELETE FROM {name} WHERE {c_schema_name} = %s;", schema.name)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/destinations/impl/duckdb/sql_client.py", line 112, in execute_sql
    with self.execute_query(sql, *args, **kwargs) as curr:
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/destinations/sql_client.py", line 323, in _wrap_gen
    raise self._make_database_exception(ex)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/destinations/impl/duckdb/sql_client.py", line 155, in _make_database_exception
    raise DatabaseUndefinedRelation(ex)
dlt.destinations.exceptions.DatabaseUndefinedRelation: Catalog Error: Table with name _dlt_version does not exist!
Did you mean "hospital_data_pipeline.hospital_data._dlt_version"?
LINE 1: DELETE FROM "hospital_data_staging"."_dlt_version" ...
                    ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/vmishechk/PycharmProjects/dlt/bya/test.py", line 10, in <module>
    info = pipeline.run(reader, write_disposition="merge", refresh="drop_resources")
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/pipeline/pipeline.py", line 223, in _wrap
    step_info = f(self, *args, **kwargs)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/pipeline/pipeline.py", line 268, in _wrap
    return f(self, *args, **kwargs)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/pipeline/pipeline.py", line 695, in run
    return self.load(destination, dataset_name, credentials=credentials)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/pipeline/pipeline.py", line 223, in _wrap
    step_info = f(self, *args, **kwargs)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/pipeline/pipeline.py", line 163, in _wrap
    return f(self, *args, **kwargs)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/pipeline/pipeline.py", line 268, in _wrap
    return f(self, *args, **kwargs)
  File "/Users/vmishechk/PycharmProjects/dlt/dlt/pipeline/pipeline.py", line 579, in load
    raise PipelineStepFailed(
dlt.pipeline.exceptions.PipelineStepFailed: Pipeline execution failed at stage load when processing package 1726073808.0531359 with exception:

<class 'dlt.destinations.exceptions.DatabaseUndefinedRelation'>
Catalog Error: Table with name _dlt_version does not exist!
Did you mean "hospital_data_pipeline.hospital_data._dlt_version"?
LINE 1: DELETE FROM "hospital_data_staging"."_dlt_version" ...

Operating system

macOS

Runtime environment

Local

Python version

3.9

dlt data source

No response

dlt destination

No response

Other deployment details

No response

Additional information

No response