duckdb / duckdb_azure

Azure extension for DuckDB
MIT License
50 stars 17 forks source link

Workload Identity Credentials #82

Closed HynekBlaha closed 1 month ago

HynekBlaha commented 1 month ago

Workload Identity Credential is created using DefaultAzureCredential. When fetching the credential, it tries to run az, which shouldn't be necessary. When installed, it still doesn't work. Could you please add it?

>>> import duckdb, os
>>> os.environ["AZURE_LOG_LEVEL"] = "verbose"
>>> duckdb.sql("CREATE OR REPLACE SECRET credentials (TYPE AZURE, PROVIDER CREDENTIAL_CHAIN, ACCOUNT_NAME 'redacted')")
┌─────────┐
│ Success │
│ boolean │
├─────────┤
│ true    │
└─────────┘

>>> duckdb.sql(f"SELECT * FROM parquet_metadata('az://file.parquet')")
[2024-09-30T13:35:39.3506030Z T: 7f9d48be4b80] DEBUG : Identity: Creating DefaultAzureCredential which combines mutiple parameterless credentials into a single one.
DefaultAzureCredential is only recommended for the early stages of development, and not for usage in production environment.
Once the developer focuses on the Credentials and Authentication aspects of their application, DefaultAzureCredential needs to be replaced with the credential that is the better fit for the application.
[2024-09-30T13:35:39.3506412Z T: 7f9d48be4b80] WARN  : Identity: EnvironmentCredential was not initialized with underlying credential.
[2024-09-30T13:35:39.3506747Z T: 7f9d48be4b80] DEBUG : Identity: EnvironmentCredential: Both 'AZURE_TENANT_ID' and 'AZURE_CLIENT_ID', and at least one of 'AZURE_CLIENT_SECRET', 'AZURE_CLIENT_CERTIFICATE_PATH' needs to be set. Additionally, 'AZURE_AUTHORITY_HOST' could be set to override the default authority host. Currently:
 * 'AZURE_TENANT_ID' is set
 * 'AZURE_CLIENT_ID' is set
 * 'AZURE_CLIENT_SECRET' is NOT set
 * 'AZURE_CLIENT_CERTIFICATE_PATH' is NOT set
 * 'AZURE_AUTHORITY_HOST' is set
 * 
->>>>>>> [2024-09-30T13:35:39.3507587Z T: 7f9d48be4b80] INFO  : Identity: WorkloadIdentityCredential was created successfully.

[2024-09-30T13:35:39.3507886Z T: 7f9d48be4b80] INFO  : Identity: AzureCliCredential created.
Successful creation does not guarantee further successful token retrieval.
[2024-09-30T13:35:39.3508203Z T: 7f9d48be4b80] DEBUG : Identity: ManagedIdentityCredential: Environment is not set up for the credential to be created with App Service 2019 source.
[2024-09-30T13:35:39.3508450Z T: 7f9d48be4b80] DEBUG : Identity: ManagedIdentityCredential: Environment is not set up for the credential to be created with App Service 2017 source.
[2024-09-30T13:35:39.3508654Z T: 7f9d48be4b80] DEBUG : Identity: ManagedIdentityCredential: Environment is not set up for the credential to be created with Cloud Shell source.
[2024-09-30T13:35:39.3508828Z T: 7f9d48be4b80] DEBUG : Identity: ManagedIdentityCredential: Environment is not set up for the credential to be created with Azure Arc source.
[2024-09-30T13:35:39.3509023Z T: 7f9d48be4b80] INFO  : Identity: ManagedIdentityCredential will be created with Azure Instance Metadata Service source.
Successful creation does not guarantee further successful token retrieval.
[2024-09-30T13:35:39.3509508Z T: 7f9d48be4b80] INFO  : Identity: DefaultAzureCredential: Created with the following credentials: EnvironmentCredential, WorkloadIdentityCredential, AzureCliCredential, ManagedIdentityCredential.
[2024-09-30T13:35:39.3509735Z T: 7f9d48be4b80] INFO  : Identity: ChainedTokenCredential: Created with the following credentials: DefaultAzureCredential.
[2024-09-30T13:35:39.3518768Z T: 7f9d43dfe6c0] WARN  : Identity: EnvironmentCredential authentication unavailable. See earlier EnvironmentCredential log messages for details.
[2024-09-30T13:35:39.3519527Z T: 7f9d43dfe6c0] DEBUG : Identity: DefaultAzureCredential: Failed to get token from EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
[2024-09-30T13:35:39.3520868Z T: 7f9d43dfe6c0] INFO  : HTTP Request : POST https://login.microsoftonline.com/0becdf60-f164-4284-a1b5-dc033963ad20/oauth2/v2.0/token
content-length : 1705
content-type : application/x-www-form-urlencoded
host : REDACTED
user-agent : azsdk-cpp-identity/1.6.0 (Linux 5.15.0-1064-azure x86_64 #73-Ubuntu SMP Tue Apr 30 14:24:24 UTC 2024)
x-ms-client-request-id : 15d02e19-4dce-4a34-8513-f6889c07e14b
[2024-09-30T13:35:39.3521133Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Creating a new session.
[2024-09-30T13:35:39.3521432Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Spawn new connection.
[2024-09-30T13:35:39.3866326Z T: 7f9d43dfe6c0] WARN  : HTTP Transport error: Fail to get a new connection for: https://login.microsoftonline.com. Problem with the SSL CA cert (path? access rights?)
[2024-09-30T13:35:39.3866807Z T: 7f9d43dfe6c0] INFO  : HTTP Retry attempt #1 will be made in 976ms.
[2024-09-30T13:35:40.3628422Z T: 7f9d43dfe6c0] INFO  : HTTP Request : POST https://login.microsoftonline.com/0becdf60-f164-4284-a1b5-dc033963ad20/oauth2/v2.0/token
content-length : 1705
content-type : application/x-www-form-urlencoded
host : REDACTED
user-agent : azsdk-cpp-identity/1.6.0 (Linux 5.15.0-1064-azure x86_64 #73-Ubuntu SMP Tue Apr 30 14:24:24 UTC 2024)
x-ms-client-request-id : 15d02e19-4dce-4a34-8513-f6889c07e14b
[2024-09-30T13:35:40.3629810Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Creating a new session.
[2024-09-30T13:35:40.3630499Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Spawn new connection.
[2024-09-30T13:35:40.3904652Z T: 7f9d43dfe6c0] WARN  : HTTP Transport error: Fail to get a new connection for: https://login.microsoftonline.com. Problem with the SSL CA cert (path? access rights?)
[2024-09-30T13:35:40.3907599Z T: 7f9d43dfe6c0] INFO  : HTTP Retry attempt #2 will be made in 1595ms.
  0% ▕                                                            ▏ [2024-09-30T13:35:41.9862518Z T: 7f9d43dfe6c0] INFO  : HTTP Request : POST https://login.microsoftonline.com/0becdf60-f164-4284-a1b5-dc033963ad20/oauth2/v2.0/token
content-length : 1705
content-type : application/x-www-form-urlencoded
host : REDACTED
user-agent : azsdk-cpp-identity/1.6.0 (Linux 5.15.0-1064-azure x86_64 #73-Ubuntu SMP Tue Apr 30 14:24:24 UTC 2024)
x-ms-client-request-id : 15d02e19-4dce-4a34-8513-f6889c07e14b
[2024-09-30T13:35:41.9868752Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Creating a new session.
[2024-09-30T13:35:41.9871915Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Spawn new connection.
[2024-09-30T13:35:42.0242381Z T: 7f9d43dfe6c0] WARN  : HTTP Transport error: Fail to get a new connection for: https://login.microsoftonline.com. Problem with the SSL CA cert (path? access rights?)
[2024-09-30T13:35:42.0243179Z T: 7f9d43dfe6c0] INFO  : HTTP Retry attempt #3 will be made in 3812ms.
[2024-09-30T13:35:45.8407666Z T: 7f9d43dfe6c0] INFO  : HTTP Request : POST https://login.microsoftonline.com/0becdf60-f164-4284-a1b5-dc033963ad20/oauth2/v2.0/token
content-length : 1705
content-type : application/x-www-form-urlencoded
host : REDACTED
user-agent : azsdk-cpp-identity/1.6.0 (Linux 5.15.0-1064-azure x86_64 #73-Ubuntu SMP Tue Apr 30 14:24:24 UTC 2024)
x-ms-client-request-id : 15d02e19-4dce-4a34-8513-f6889c07e14b
[2024-09-30T13:35:45.8408228Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Creating a new session.
[2024-09-30T13:35:45.8408462Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Spawn new connection.
[2024-09-30T13:35:45.9161489Z T: 7f9d43dfe6c0] WARN  : HTTP Transport error: Fail to get a new connection for: https://login.microsoftonline.com. Problem with the SSL CA cert (path? access rights?)

->>>>>> [2024-09-30T13:35:45.9162850Z T: 7f9d43dfe6c0] DEBUG : Identity: DefaultAzureCredential: Failed to get token from WorkloadIdentityCredential: GetToken(): Fail to get a new connection for: https://login.microsoftonline.com. Problem with the SSL CA cert (path? access rights?)
/bin/sh: 1: az: not found

[2024-09-30T13:35:45.9667745Z T: 7f9d43dfe6c0] DEBUG : Identity: TokenCredentialImpl::ParseToken(): Cannot parse the string '' as JSON.
[2024-09-30T13:35:45.9668740Z T: 7f9d43dfe6c0] WARN  : Identity: AzureCliCredential didn't get the token: ""
[2024-09-30T13:35:45.9669597Z T: 7f9d43dfe6c0] DEBUG : Identity: DefaultAzureCredential: Failed to get token from AzureCliCredential: AzureCliCredential didn't get the token: ""
[2024-09-30T13:35:45.9670616Z T: 7f9d43dfe6c0] INFO  : HTTP Request : GET http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=REDACTED
metadata : REDACTED
user-agent : azsdk-cpp-identity/1.6.0 (Linux 5.15.0-1064-azure x86_64 #73-Ubuntu SMP Tue Apr 30 14:24:24 UTC 2024)
x-ms-client-request-id : 64283197-1e37-49cb-beb0-bdfd3c571d67
[2024-09-30T13:35:45.9671409Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Creating a new session.
[2024-09-30T13:35:45.9671971Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Spawn new connection.
[2024-09-30T13:35:45.9677746Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: No Host in request headers. Adding it
[2024-09-30T13:35:45.9678681Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Send request without payload
[2024-09-30T13:35:45.9680388Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Parse server response
[2024-09-30T13:35:45.9733376Z T: 7f9d43dfe6c0] DEBUG : [CURL Transport Adapter]: Request completed. Moving response out of session and session to response.
[2024-09-30T13:35:45.9734875Z T: 7f9d43dfe6c0] INFO  : HTTP Response (6ms) : 400 Bad Request
content-length : 168
content-type : application/json; charset=utf-8
date : Mon, 30 Sep 2024 13:35:45 GMT
server : IMDS/150.870.65.1475
x-ms-request-id : 9366ffa2-210f-4b2c-8480-06edaf9b7864
[2024-09-30T13:35:45.9735804Z T: 7f9d43dfe6c0] INFO  : HTTP status code 400 won't be retried.
[2024-09-30T13:35:45.9736719Z T: 7f9d43dfe6c0] DEBUG : Identity: DefaultAzureCredential: Failed to get token from ManagedIdentityCredential: GetToken(): error response: 400 Bad Request
[2024-09-30T13:35:45.9737231Z T: 7f9d43dfe6c0] WARN  : Identity: DefaultAzureCredential: Didn't succeed to get a token from any credential in the chain.
[2024-09-30T13:35:45.9737859Z T: 7f9d43dfe6c0] DEBUG : Identity: ChainedTokenCredential: Failed to get token from DefaultAzureCredential: Failed to get token from DefaultAzureCredential.
See Azure::Core::Diagnostics::Logger for details (https://aka.ms/azsdk/cpp/identity/troubleshooting).
[2024-09-30T13:35:45.9738436Z T: 7f9d43dfe6c0] WARN  : Identity: ChainedTokenCredential: Didn't succeed to get a token from any credential in the chain.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
duckdb.duckdb.IOException: IO Error: AzureBlobStorageFileSystem could not open file: 'az://file.parquet', unknown error occurred, this could mean the credentials used were wrong. Original error message: 'Failed to get token from ChainedTokenCredential.'
>>>
HynekBlaha commented 1 month ago

Hello, I opened a PR with this feature here: https://github.com/duckdb/duckdb_azure/pull/83. My team would love to start using duckdb in our kubernetes workloads, so I am highly invested in this feature. Could you please find time to review it and suggest improvements, @quentingodeau? 🙏

Thank you!

HynekBlaha commented 1 month ago

Hi @samansmink, I am writing to confirm the nightly build works on our kubernetes workloads. ✅

>>> import duckdb
>>>
>>> duckdb.sql("force install azure from core_nightly")
>>> # https://github.com/duckdb/duckdb/discussions/9675#discussioncomment-9327842
>>> duckdb.sql("SET azure_transport_option_type = 'curl'") 
>>> duckdb.sql("CREATE OR REPLACE SECRET credentials (TYPE AZURE, PROVIDER CREDENTIAL_CHAIN, CHAIN 'workload_identity', ACCOUNT_NAME '<REDACTED>')")
┌─────────┐
│ Success │
│ boolean │
├─────────┤
│ true    │
└─────────┘

>>> duckdb.sql("SELECT count(*) FROM 'azure://market/testfile.parquet'")
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│            2 │
└──────────────┘