duckdb / duckdb_azure

Azure extension for DuckDB
MIT License
50 stars 17 forks source link

Add Workload Identity Credentials #83

Closed HynekBlaha closed 1 month ago

HynekBlaha commented 1 month ago

Hello, I would like to discuss adding option for using Workload Identity credentials. From my (very limited) point of view, adding support is straight-forward as the WorkloadIdentityCredential is part of DefaultAzureCredential since 1.6.0: https://github.com/Azure/azure-sdk-for-cpp/issues/4906

Notes:

Sorry if the code is not up to standards, any suggestions how to improve it are very welcome.

samansmink commented 1 month ago

Hey @HynekBlaha thanks for the PR!

We do really need to have some testing to be able to merge this PR though, at least some instructions on how we can test this, if setting that up in CI is tricky.

HynekBlaha commented 1 month ago

Understood. I would like to test it in real k8s cluster with provided workload identity.

The way I see it, I need to complete these steps: 1) cross-compile it to same arch as python-slim image has 2) upload built extension to public https (eg by commiting the executable to my github project) 3) load it in kubernetes 4) validate workload_identity works.

Do you have some example how to do step 1? Other steps should be a piece of cake.

HynekBlaha commented 1 month ago

I successfully managed to run a test locally. Used env variables from the pod running on kubernetes. I will add test with env secrets to run in my CI.

HynekBlaha commented 1 month ago

The access token is short lived, so it doesn't make sense to set it as a CI secret in my forked repository. Would it be enough if I share the test I successfully ran locally and steps I took?

1) I described pod running with workload identity and copied following environment variables:

2) I shelled into pod and copied JTW token located at $AZURE_FEDERATED_TOKEN_FILE 3) Set AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_AUTHORITY_HOST as environment variables. 4) Set AZURE_FEDERATED_TOKEN_FILE to local path with content of file from 2) 5) Created test/sql/cloud/workload_identity.test

# name: test/sql/cloud/workload_identity.test
# description: test workload-identity authentication
# group: [azure]

require azure

require parquet

require-env AZURE_CLIENT_ID

require-env AZURE_TENANT_ID

require-env AZURE_FEDERATED_TOKEN_FILE

require-env AZURE_AUTHORITY_HOST

statement ok
set allow_persistent_secrets=false

statement error
SELECT count(*) FROM 'azure://market/testfile.parquet';
----
Invalid Input Error: No valid Azure credentials found!

statement ok
CREATE OR REPLACE SECRET az1 (
    TYPE AZURE,
    PROVIDER CREDENTIAL_CHAIN,
    CHAIN 'workload_identity',
    ACCOUNT_NAME '<REDACTED>'
)

query I
SELECT count(*) FROM 'azure://market/testfile.parquet';
----
2

query I
SELECT count(*) FROM 'azure://market/testfile.parquet';
----
0 # This should fail

6) Ran make test

Is this enough to serve as a proof of valid implementation, @samansmink? 🤞

samansmink commented 1 month ago

@HynekBlaha I'm okay with adding this if you say it's working for you.

Adding CI for this is a pain and i don't think this would break any existing workflow even if it did turn out to break at some point

HynekBlaha commented 1 month ago

Great, that's amazing to hear! Should I open a MR to docs to describe support for workload_identity in CREDENTIAL_CHAIN? Also, is there any timeline when new version of duckdb-azure extension will be released? I would like to use it when possible.
I remember I read somewhere that nightly builds can be explicitly loaded. Is it true?

Thank you for support, Hynek

samansmink commented 1 month ago

Should I open a MR to docs to describe support for workload_identity in CREDENTIAL_CHAIN?

@HynekBlaha Yea that would be great!

The nightly build will be available after this PR is merged and DuckDB is updated in the azure extension (which i will do right in a sec). You can install it on latest duckdb using:

force install azure from core_nightly

samansmink commented 1 month ago

@HynekBlaha after https://github.com/duckdb/duckdb_azure/pull/85 is merged and the CI of that merge has completed your nightly binary will be available on duckdb v1.1.1