duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
788 stars 70 forks source link

Delta on Azure / Account key leak #363

Closed mycaule closed 3 months ago

mycaule commented 3 months ago

azure_storage_account_key gets leaked on generating documentation with dbt docs generate

Value is shown in the details of tables when viewing the HTML documentation.

Here is my configuration following @milicevica23 's repo which is also mentionned in the README of this repo.

Just wondering if there is also a way to use managed identity in storage_options to connect just with my Active Directory identity?

profiles.yml

outputs:
    dev_duckdb:
      type: duckdb
      path: /tmp/my-local.duckdb
      plugins:
        - module: delta

sources.yml

- name: my_delta_source
    config:
      plugin: delta
    tables:
      - name: my_table
        meta:
          delta_table_path: abfss://container@storage_account.dfs.core.windows.net/path/to/my_table
          storage_options:
            azure_storage_account_key: "{{ env_var('AZURE_STORAGE_ACCOUNT_KEY') }}"

~/.zshrc

export AZURE_STORAGE_ACCOUNT_KEY = "abcd"
jwills commented 3 months ago

Hey @mycaule -- you should use config instead of meta for secrets there-- they won't be included in the docs and things work the same way. Sorry about the confusion there, will update the docs to make "config" the default for stuff.

jwills commented 3 months ago

Re: managed storage options for Azure, is that similar to the AWS credentials chain? (Apologies I am not that familiar with Azure)

mycaule commented 3 months ago

Thank you, yes there is a concept of DefaultAzureCredential() just like in AWS, and you can allow managed identities to access the Blob storage to avoid using any passwords.

jwills commented 3 months ago

Yes that sounds superior, looking into it (read: "Asking GPT4 to write the code for me")

mycaule commented 3 months ago

It worked this way

Maybe I can help update the README or developer documentation website if there is any ?

sources.yml

version: 2

sources:
  - name: my_delta_source
    config:
      plugin: delta
      storage_options:
        azure_use_azure_cli: "True"

    tables:
      - name: table1
        config:
          delta_table_path: abfss://container@account_name.dfs.core.windows.net/path/to/table1

      - name: table2
        config:
          delta_table_path: abfss://container@account_name.dfs.core.windows.net/path/to/table2

See https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variant.UseAzureCli

milicevica23 commented 3 months ago

Hi @mycaule, thank you for pointing that out. As you pointed out above https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variants should be all possible configurations that can be used to access the Azure object storage. The only thing is if you use it as env variables it should be written with the upper case

Currently, I am on a trip so I will update my repo next week with more examples