duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
788 stars 70 forks source link

Add support for new DuckDB Secrets manager #403

Open guenp opened 2 weeks ago

guenp commented 2 weeks ago

This adds support for the new DuckDB secrets manager. Example usage:

default:
  outputs:
    dev:
      type: duckdb
      path: /tmp/dbt.duckdb
      secrets:
        - type: s3
          key_id: abc
          secret: xyz
          region: us-west-2
  target: dev
guenp commented 1 week ago

Note that glue still implements the s3_... settings variables. I'm not really sure how to tackle that - Plugin currently only takes settings in the BasePlugin.create factory method. We could add secrets there, or just pass the creds. Thoughts?

jwills commented 1 week ago

@guenp wanted to clarify the issue with the glue plugin; one could still specify the necessary S3 info in the settings, right? But if you wanted to e.g. use the credential provider chain in conjunction with the Glue plugin, that would fail?

guenp commented 1 week ago

@guenp wanted to clarify the issue with the glue plugin; one could still specify the necessary S3 info in the settings, right? But if you wanted to e.g. use the credential provider chain in conjunction with the Glue plugin, that would fail?

Yes, that's right, because I've deprecated the _load_aws_credentials method, use_credential_provider no longer uses boto3 to fetch the credentials from aws SSO. If you use the credential provider, the key_id and secret are no longer explicitly stored in memory in Python, instead it's handled by DuckDB internally. So the glue plugin can therefore also no longer access them. If we want to use the credential provider with glue plugin we'd need to migrate the old code from _load_aws_credentials into there, I believe.

Additionally, if people do specify the key_id and secret explicitly, they will now do so via the secrets key, not the settings. So we need the glue plugin to be able to access the secrets somehow, right now it can only access settings. I think it's possible to do that, but am curious wdyt?

jwills commented 1 week ago

Ah so this wouldn't be backwards compatible, then-- folks would need to cut over to the new way. I'm assuming there's got to be a similar deprecation schedule for upstream DuckDB, right?

guenp commented 1 week ago

Ah so this wouldn't be backwards compatible, then-- folks would need to cut over to the new way. I'm assuming there's got to be a similar deprecation schedule for upstream DuckDB, right?

Upstream DuckDB has named it "legacy authentication scheme for S3 API", but not sure what the deprecation schedule looks like: https://duckdb.org/docs/extensions/httpfs/s3api_legacy_authentication.html

glue will work if you use the legacy S3 auth, but in this PR it currently doesn't work with the new secrets manager.

Couple questions just to make sure:

  1. Do we want to keep supporting legacy authentication S3 at least until DuckDB deprecates it, or do we want to switch everyone over to the new secrets manager? If so, I can put the _load_aws_credentials method & usage thereof back in, and throw an exception if S3 secrets are redundantly defined in the settings and in secrets.
  2. Do we want glue to work with the new secrets manager, or shall we keep it on the legacy auth and tell people to keep using that?
guenp commented 1 week ago

use_credential_provider no longer uses boto3 to fetch the credentials from aws SSO.

This is actually incorrect, I checked and boto3 can fetch the creds from AWS SSO itself. So this is no longer an issue.

I'm going to move forward with making glue compatible with the Secrets Manager, and will add in a test to make sure S3 legacy auth is still supported.