dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.76k stars 182 forks source link

Layout config is not respected in `filesystem` destination when using an `sql_database` source #2107

Open trymzet opened 4 days ago

trymzet commented 4 days ago

dlt version

1.4.0

Describe the problem

When using the Filesystem destination with eg. the following layout config:

[destination.filesystem]
bucket_url = ""
layout = "{schema_name}/{table_name}/{load_id}.{file_id}.{ext}"

The data is still loaded into {schema_name}/sql_database/{table_name}/{load_id}.{file_id}.{ext} (notice that a hardcoded sql_database directory is unexpectedly inserted by dlt).

Expected behavior

I think:

  1. This name should be controllable (eg, if I have multiple SQL databases, I want to use specific db name instead of the generic "sql_database"). This might be part of the broader issue that currently, per-database configuration is not supported by dlt for sql_database sources (you either have one sql_database config or per-pipeline configs).
  2. This behavior should be documented in the layout docs.
  3. Or, the layout should be applied as specified in the config.
trymzet commented 22 hours ago

I've also noticed the layout produced when using sql_table() with the same config is different still -- the produced layout is {schema_name}_<some_unidentified_datetime_value>/dlt_{table_name}/{load_id}.{file_id}.

As you can see, some datetime string (I don't know what it is, but it looks like this for example: my_schema_20241202113340) is added as a suffix to the schema name, and the table name is prefixed with dlt_.