duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
877 stars 78 forks source link

reading from encrypted parquet file #419

Open jb8628 opened 2 months ago

jb8628 commented 2 months ago

DuckDB allows use of encrypted Parquet files.

The key is set via a PRAGMA statement like PRAGMA add_parquet_key('key128', '0123456789112345');

On reading the syntax is read_parquet('file.parquet', encryption_config = {footer_key: 'key128'});

Is it possible to set the parquet key within dbt on using an encrypted parquet file as source, so that it can be accessed as follows?

meta: external_location: "read_parquet('file.parquet', encryption_config = {footer_key: 'key128'})"
formatter: oldstyle

I have tried config and dbt hooks without success.

jwills commented 2 months ago

It's not obvious to me why that wouldn't work with an on-run-start hook to set the key via the pragma statement and the external location as you had it configured; how does it fail? And what does the compiled output look like?

jb8628 commented 2 months ago

Thanks for your reply! It fails with:

Binder Error: No key with name "key128" exists. Add it with PRAGMA add_parquet_key('','');

on running a model using the source.

This is with on-run-start: "PRAGMA add_parquet_key('key128', '0123456789112345');" in dbt_project.yml.

The compiled output of the model looks correct but it does not seem to have access to the key. Am I doing something wrong with the hook?

jwills commented 2 months ago

Ah-- I bet the key is only cached on the specific connection that uses it-- try it with the keep_open: true setting enabled in your profile?

jb8628 commented 2 months ago

Yes, that works. Gives a good option for ingesting more sensitive data. Thank you so much for your help.

jb8628 commented 2 months ago

In fact, following your pointer, I now seem to have this working without keep_open: true by specifying +pre-hook: "PRAGMA add_parquet_key('key128', '0123456789112345');" for the model resource.