duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
851 stars 77 forks source link

Support PyArrow Dataset as a valid return type for Python models #119

Closed davidgasquez closed 1 year ago

davidgasquez commented 1 year ago

From the docs:

The pyarrow.dataset module provides functionality to efficiently work with tabular, potentially larger than memory, and multi-file datasets.

Since pyarrow.table type won't work with larger than memory datasets, this might be a good alternative. I was also thinking about Polars but wanted to check first if this was easier to implement.

jwills commented 1 year ago

Hrm, it's possible that after @AlexanderVR's most recent PR, this will (almost) just work? We may need to generalize this check to look for pyarrow.dataset.Dataset as well?

davidgasquez commented 1 year ago

Sweet! It works when installing dbt-duckdb from GitHub. Thanks a lot @jwills!