dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
1.97k stars 116 forks source link

Schema Conflict on Source Data Manipulation #1525

Open itopaloglu83 opened 5 days ago

itopaloglu83 commented 5 days ago

dlt version

v0.4.12

Describe the problem

After following the walkthrough tutorial, I started experimenting with source data manipulation to add, remove, and manipulate columns by following the how to remove columns tutorial. Despite deleting all destination tables and schemas in a Postgres database, I kept getting an error about not being able to delete a table that no longer exists (I had converted some array columns to string which no longer required expanding).

At the time I wasn’t able to figure out the issue and created a brand new project to overcome it. Although the documentation is quite good, the cache folders are a little buried and the change of source schemas might be improved, at least some mention of the cached values maybe?

Expected behavior

No response

Steps to reproduce

I’m on mobile right now, I’ll need to get back to about exact replication steps

Operating system

macOS

Runtime environment

Local

Python version

3.10

dlt data source

REST API

dlt destination

Postgres

Other deployment details

No response

Additional information

DLT is awesome, thank you!

itopaloglu83 commented 4 days ago

Here are the reproduction steps:

  1. Create a pipeline with the following data and create the persons and hobbies tables. I used the rest api source and postgres destination with the merge option.
    [
    {
    "name": "Alice",
    "age": 30,
    "hobbies": ["Reading", "Painting"]
    },
    {
    "name": "Bob",
    "age": 25,
    "hobbies": ["Gardening", "Cooking"]
    },
    {
    "name": "Charlie",
    "age": 35,
    "hobbies": ["Photography", "Hiking"]
    }
    ]
  2. Delete all the tables from all the schemas, target and staging.
  3. Remove the hobbies element or replace it with concatenated strings of hobbies.
  4. Run the pipeline and you should get an error saying unable to delete from table hobbies.
  5. Remove the cached pipeline data from ~/.dlt/pipelines and the pipeline will succeed.