Closed HynekBlaha closed 1 month ago
Hey @HynekBlaha thanks for the PR!
We do really need to have some testing to be able to merge this PR though, at least some instructions on how we can test this, if setting that up in CI is tricky.
Understood. I would like to test it in real k8s cluster with provided workload identity.
The way I see it, I need to complete these steps: 1) cross-compile it to same arch as python-slim image has 2) upload built extension to public https (eg by commiting the executable to my github project) 3) load it in kubernetes 4) validate workload_identity works.
Do you have some example how to do step 1? Other steps should be a piece of cake.
I successfully managed to run a test locally. Used env variables from the pod running on kubernetes. I will add test with env secrets to run in my CI.
The access token is short lived, so it doesn't make sense to set it as a CI secret in my forked repository. Would it be enough if I share the test I successfully ran locally and steps I took?
1) I described pod running with workload identity and copied following environment variables:
2) I shelled into pod and copied JTW token located at $AZURE_FEDERATED_TOKEN_FILE
3) Set AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_AUTHORITY_HOST as environment variables.
4) Set AZURE_FEDERATED_TOKEN_FILE to local path with content of file from 2)
5) Created test/sql/cloud/workload_identity.test
# name: test/sql/cloud/workload_identity.test
# description: test workload-identity authentication
# group: [azure]
require azure
require parquet
require-env AZURE_CLIENT_ID
require-env AZURE_TENANT_ID
require-env AZURE_FEDERATED_TOKEN_FILE
require-env AZURE_AUTHORITY_HOST
statement ok
set allow_persistent_secrets=false
statement error
SELECT count(*) FROM 'azure://market/testfile.parquet';
----
Invalid Input Error: No valid Azure credentials found!
statement ok
CREATE OR REPLACE SECRET az1 (
TYPE AZURE,
PROVIDER CREDENTIAL_CHAIN,
CHAIN 'workload_identity',
ACCOUNT_NAME '<REDACTED>'
)
query I
SELECT count(*) FROM 'azure://market/testfile.parquet';
----
2
query I
SELECT count(*) FROM 'azure://market/testfile.parquet';
----
0 # This should fail
6) Ran make test
Is this enough to serve as a proof of valid implementation, @samansmink? 🤞
@HynekBlaha I'm okay with adding this if you say it's working for you.
Adding CI for this is a pain and i don't think this would break any existing workflow even if it did turn out to break at some point
Great, that's amazing to hear!
Should I open a MR to docs to describe support for workload_identity in CREDENTIAL_CHAIN?
Also, is there any timeline when new version of duckdb-azure extension will be released? I would like to use it when possible.
I remember I read somewhere that nightly builds can be explicitly loaded. Is it true?
Thank you for support, Hynek
Should I open a MR to docs to describe support for workload_identity in CREDENTIAL_CHAIN?
@HynekBlaha Yea that would be great!
The nightly build will be available after this PR is merged and DuckDB is updated in the azure extension (which i will do right in a sec). You can install it on latest duckdb using:
force install azure from core_nightly
@HynekBlaha after https://github.com/duckdb/duckdb_azure/pull/85 is merged and the CI of that merge has completed your nightly binary will be available on duckdb v1.1.1
Hello, I would like to discuss adding option for using Workload Identity credentials. From my (very limited) point of view, adding support is straight-forward as the
WorkloadIdentityCredential
is part ofDefaultAzureCredential
since 1.6.0: https://github.com/Azure/azure-sdk-for-cpp/issues/4906Notes:
Sorry if the code is not up to standards, any suggestions how to improve it are very welcome.