Open Nintorac opened 3 months ago
Hey @Nintorac this is an implementation decision and not a bug, I agree though that we should probably add a note about it in the docs. Is the fact that the metadata tables are stored as jsonl posing a problem for you at this time?
Mainly my aversion to jsonl for now aha, but some issues I forsee
Would be interested to know why the metadata table write mechanism doesn't use the same pathway as data table write though? from my limited perspective it seems like this functionality should be implemented at the abstract destination level
@Nintorac ok I understand. So you are actually reading the metadata files in your code? I was more or less working under the assumption that they are for internal dlt use only. But it is a fair point.
I was intending to use it for change data capture for scd2 type tables (since this isn't supported natively)
But I wasn't aware they were meant for internal use only.
I'd say they are not strictly meant for internal use, I just didn't expect anyone wanting to query them in the way you describe. scd2 tables currently are not supported for the filesystem by the way (although with the delta tables it should actually work). Could you explain in a bit more detail what you want to do? I'd like to understand the use-case and maybe offer some help or take some inspirations for further work on the filesystem.
dlt version
dlt==0.5.1
Describe the problem
Configuring the
preferred_loader_file_format
for the filesystem destination does not respectpreferred_loader_file_format
kwargFurther discussion here
Expected behavior
When configuring
preferred_loader_file_format="parquet"
I expect the metadata files to be in parquet format, instead they are jsonl.Steps to reproduce
Operating system
Linux
Runtime environment
Local
Python version
3.10
dlt data source
No response
dlt destination
Filesystem & buckets
Other deployment details
No response
Additional information
No response