duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
788 stars 70 forks source link

When using external JSON materalization: bumping into default maximum_object_size limit. #409

Open firewall413 opened 5 days ago

firewall413 commented 5 days ago

https://github.com/duckdb/dbt-duckdb/blob/ab16970ba9f616205dcae52a9dcb661c6d8836c6/dbt/include/duckdb/macros/materializations/external.sql#L52

When materializing a table to a JSON file bigger than 30MB, we bump into the following:

Invalid Input Error: "maximum_object_size" of 16777216 bytes exceeded while reading file "s3://xxxxxx.json" (>33554428 bytes). Try increasing "maximum_object_size".

This is likely due to the *select from '{{ read_location }}'** trying to build a view with the default read_json_auto() and default options params.

Would it be possible to pass the read_json/read_parquet/read_csv functions and their options params?

jwills commented 5 days ago

Yes, I think; there would need to be a PR that modified this function to let you override more of the defaults using the rendered_options dictionary (like we do for external materializations that use partitioning): https://github.com/duckdb/dbt-duckdb/blob/master/dbt/adapters/duckdb/impl.py#L166