duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
883 stars 78 forks source link

Allow arbitrary extensions #157

Open davidgasquez opened 1 year ago

davidgasquez commented 1 year ago

Would be cool to be able to install third party extensions like duckdb-python-udf from dbt-duckdb. Perhaps something like this?

default:
  outputs:
    dev:
      type: duckdb
      path: /tmp/dbt.duckdb
      extensions:
        - httpfs
        - parquet
        - python_udf@net.ednit.duckdb-extensions.s3.us-west-2.amazonaws.com
jwills commented 1 year ago

ooh, that is nice syntax-- will do!

jwills commented 1 year ago

@markroddy do you have thoughts on how to config this, or does David's syntax look right to you? Anything else I would need to do?

MarkRoddy commented 1 year ago

Two things you may or may not might want to account for:

  1. Pinning an extension to a specific version number: I think this is kind of nebulous in DuckDB itself (or undocumented), because I don't know if there's a way to explicitly load an extension at a specific version number. However, when you load an extension it looks up a well defined path in S3, and that includes a version number. There's reference to this in the docs on manually installing extensions. If there's no way to pin itself in DuckDB itself this project could just download them, or, also very reasonable, completely punt on it till support is added. Though even w/o current support I think it's something worth thinking about how to support in the future because I have to image that's a feature that'll show up eventually, and config syntax is so hard to change.
  2. Should the configuration syntax require explicitly setting the 'unsigned' option? This is required for using 3rd party extensions. That need for this setting could be inferred from the presence of the S3 bucket so it's not explicitly necessary. However, I wonder if it might make some people uncomfortable not realizing that is happening on their behalf, and an explicit setting might make them feel more warm and fuzzy.
MarkRoddy commented 1 year ago

PS - these are somewhat philosophical future proof-y thoughts. On a whole this looks great and super helpful!

jwills commented 1 year ago

I support setting the unsigned extensions option explicitly on startup via the (undocumented and poorly named) config_options setting on the profile: https://github.com/jwills/dbt-duckdb/blob/master/dbt/adapters/duckdb/environments/__init__.py#LL55C27-L55C27

...to your point tho, downloading and installing the extension locally is also an option, and might be the best way to handle this while the spec here is still in flux.