Open thomasfrederikhoeck opened 1 year ago
https://github.com/Azure/azure-workload-identity/pull/594 should be fix for this. I'll test this scenario to confirm.
@thomasfrederikhoeck the issue should be fixed with #594. Would you be able to try out aramase/proxy:v0.14.0-alpha.0
for the proxy?
Thank you for the quick response and fix @aramase ! The fix still requires sidecar to be used, right? :-) I'm not 100% how to test it as we have deployed the workload identity with az aks update -g <rg_name>-n <cluster_name>--enable-oidc-issuer --enable-workload-identity
? Also, when this is eventually merged will our cluster automatically use the fix because we deployed this way?
@thomasfrederikhoeck The image I shared above is for the sidecar. You can create a yaml similar to this with your application instead of adding the annotation to test if it works for your workload.
Also, when this is eventually merged will our cluster automatically use the fix because we deployed this way?
The fix is merged and will be part of next OSS release. After OSS release, AKS will use the latest release for the add-on and rollout to all clusters.
@aramase I didn't fix it for me :-/ I still see the same behavior when using mcr.microsoft.com/oss/azure/workload-identity/proxy-init:v0.13.0
with aramase/proxy:v0.14.0-alpha.0
when I apply:
apiVersion: v1
kind: Pod
metadata:
name: awli-test
spec:
serviceAccountName: my-workload-identity
initContainers:
- name: init-networking
image: mcr.microsoft.com/oss/azure/workload-identity/proxy-init:v0.13.0
securityContext:
capabilities:
add:
- NET_ADMIN
drop:
- ALL
privileged: true
runAsUser: 0
env:
- name: PROXY_PORT
value: "8000"
containers:
- name: proxy
image: aramase/proxy:v0.14.0-alpha.0
ports:
- containerPort: 8000
resources:
limits:
cpu: "2"
memory: 10G
requests:
cpu: 500m
memory: 2G
- name: main
args:
- /bin/bash
- -c
- --
- while true; do sleep 30; done;
env:
- name: SERVER
value: $SERVER
- name: UID
value: $CLIENT_ID
image: my-image
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: "2"
memory: 10G
requests:
cpu: 500m
memory: 5G
I know it is not a networking issue as it worked for both AAD Pod identity and for Service Principal. Can you get it to work if you use the image that i provided?
I know it is not a networking issue as it worked for both AAD Pod identity and for Service Principal. Can you get it to work if you use the image that i provided?
@thomasfrederikhoeck for me the sample image doesn't work even without workload/pod identity. I just tried assigning the identity manually to the VMSS and deployed the pod which still fails with the same error.
@aramase weird! The above worked for me both in AAD Pod identity and it also currently works fine in Azure Functions with docker image and user-managed identity. Do you have a cluster with AAD pod identity where you can see if it works for you?
@aramase I just SSH'ed into one of our Azure Functions and ran the above python snip-it with success with an appropriate client_id and server address.
The driver version is:
~ odbcinst -q -d -n "ODBC Driver 17 for SQL Server"
[ODBC Driver 17 for SQL Server]
Description=Microsoft ODBC Driver 17 for SQL Server
Driver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.10.so.1.1
UsageCount=1
So the above code works with User-Managed Identity in Azure Functions. :-) I it worked with AAD Pod Identity.
@aramase do you have an env where you can replicate? Or still no luck?
@aramase do you have an env where you can replicate? Or still no luck?
Still no luck. I'll try again later this week.
Hi @aramase :-) Have you had more time to look at the issue?
Hi @aramase :-) Have you had more time to look at the issue?
@thomasfrederikhoeck unfortunately, not yet! I merged a few more changes for the proxy. Could you try our nightly build ghcr.io/azure/azure-workload-identity/proxy:latest
from the main branch?
Hi @aramase, just FYI, proxy implementation also break java mssql-jdbc drivers because they expect expire_in field to be string, but it is a number - https://github.com/Azure/azure-workload-identity/blob/d7b71919d06cd963a77b90f0ca4a1ab6d4b83025/pkg/proxy/proxy.go#L59
There is parsing logic based on it, see https://github.com/microsoft/mssql-jdbc/blob/e4e783a0eb13d945b651f400f869cd1edead8f17/src/main/java/com/microsoft/sqlserver/jdbc/SQLServerSecurityUtility.java#L427 This applies to any version before latest, 12 (there was some serious refactoring and they replaced usage of expires_in with expires_at, which is unchanged in proxy)
Could this be fixed? Otherwise it ruins pretty much everything.
@aramase I'm still seeing this error. I tried deploying a new AKS cluster using the Workload Identity in GA. I tried both newest SQL Server ODBC 17 Driver and SQL ODBC 18 Driver and run the following code, but it still doesn't connect. I have also confimred that it is not a firewall issues since it works with a Service Principal. I have also checked that the Workload Identity works for other Azure resources such as a Key Vault. The application looks like this:
import sqlalchemy
import urllib
import os
import pandas as pd
server = "MY_AZURE_SQL_SERVER"
database = "MY_AZURE_SQL_DATABASE"
driver = "ODBC Driver 18 for SQL Server"
client_id_msi = "MY ID"
connect_params = [
f"DRIVER={{{driver}}}",
f"SERVER={{{server}}}",
f"DATABASE={{{database}}}",
"Encrypt=yes",
"TrustServerCertificate=no",
"Connection Timeout=30",
f"UID={{{client_id_msi}}}",
"Authentication=ActiveDirectoryMsi"
]
params = urllib.parse.quote_plus(";".join(connect_params))
connection_string = f"mssql+pyodbc:///?odbc_connect={params}"
engine = sqlalchemy.create_engine(
connection_string,
fast_executemany=True,
)
pd.read_sql("select 1", engine)
It looks like there is also people at Microsoft hitting this issue: https://techcommunity.microsoft.com/t5/azure-database-support-blog/lesson-learned-384-odbc-driver-not-supporting-aks-workload/ba-p/3858209
Hi, any updates on this? I'm also having issues with ODBC driver for workload identity. I was able to connect using ADO/jdbc connections strings, but having below error when using odbc. [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Login failed for user 'token-identified principal'.
Describe the bug I have a container that connect to a Azure Database using a User Managed Identity. It previously worked when using AAD Pod identity. I have tried enabling the sidecar (as I think the driver uses ADAL) and I can see in the logs that it make some calls but it somehow doesn't work when it gets back to the container.
Steps To Reproduce
1: Create an appropriate image with ODBC driver and python:
where the script.py is:
2: deploy to cluster with following yaml:
Where
URL_FOR_AZURE_SQL_SERVER
,CLIENT_ID_FOR_USER_MANAGED_IDENITITY
andtest-image:latest
are changed accordingly.3: See logs before running script by running
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "//script.py", line 19, in
engine.execute("select 1")
File "", line 2, in execute
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/deprecations.py", line 402, in warned
return fn(*args, kwargs)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3256, in execute
connection = self.connect(close_with_result=True)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3315, in connect
return self._connection_cls(self, close_with_result=close_with_result)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 96, in init
else engine.raw_connection()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3394, in raw_connection
return self._wrap_pool_connect(self.pool.connect, _connection)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3364, in _wrap_pool_connect
Connection._handle_dbapi_exception_noconnection(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2198, in _handle_dbapi_exceptionnoconnection
util.raise(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
raise exception
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3361, in _wrap_pool_connect
return fn()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 320, in connect
return _ConnectionFairy._checkout(self)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 884, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 486, in checkout
rec = pool._do_get()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/impl.py", line 146, in _do_get
self._decoverflow()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in exit
compat.raise(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
raise exception
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/impl.py", line 143, in _do_get
return self._create_connection()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 266, in _create_connection
return ConnectionRecord(self)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 381, in init
self.connect()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 678, in connect
pool.logger.debug("Error on connect(): %s", e)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in exit
compat.raise(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
raise exception
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 673, in __connect
self.dbapi_connection = connection = pool._invoke_creator(self)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/create.py", line 578, in connect
return dialect.connect(*cargs, *cparams)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 598, in connect
return self.dbapi.connect(cargs, cparams)
sqlalchemy.exc.OperationalError: (pyodbc.OperationalError) ('HYT00', '[HYT00] [unixODBC][Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired (0) (SQLDriverConnect)')
kubectl logs -n prefect awli-test -c azwi-proxy I1017 10:02:24.612091 1 proxy.go:96] proxy "msg"="starting the proxy server" "port"=8000 "userAgent"="azure-workload-identity/proxy/v0.13.0 (linux/amd64) 46c5137/2022-08-31-23:38" I1017 10:02:24.653617 1 proxy.go:167] proxy "msg"="received readyz request" "method"="GET" "uri"="/readyz" I1017 10:03:14.931967 1 proxy.go:101] proxy "msg"="received token request" "method"="GET" "uri"="/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fdatabase.windows.net%2F&object_id=h869042n-6f42-5jg5-5g2f-48956645645f2" I1017 10:03:15.432570 1 proxy.go:101] proxy "msg"="received token request" "method"="GET" "uri"="/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fdatabase.windows.net%2F&object_id=h869042n-6f42-5jg5-5g2f-48956645645f" I1017 10:03:16.433120 1 proxy.go:101] proxy "msg"="received token request" "method"="GET" "uri"="/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fdatabase.windows.net%2F&object_id=h869042n-6f42-5jg5-5g2f-48956645645f" I1017 10:03:18.433681 1 proxy.go:101] proxy "msg"="received token request" "method"="GET" "uri"="/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fdatabase.windows.net%2F&object_id=h869042n-6f42-5jg5-5g2f-48956645645f" I1017 10:03:33.455898 1 proxy.go:101] proxy "msg"="received token request" "method"="GET" "uri"="/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fdatabase.windows.net%2F&object_id=h869042n-6f42-5jg5-5g2f-48956645645f" I1017 10:03:33.956334 1 proxy.go:101] proxy "msg"="received token request" "method"="GET" "uri"="/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fdatabase.windows.net%2F&object_id=h869042n-6f42-5jg5-5g2f-48956645645f" I1017 10:03:34.956857 1 proxy.go:101] proxy "msg"="received token request" "method"="GET" "uri"="/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fdatabase.windows.net%2F&object_id=h869042n-6f42-5jg5-5g2f-48956645645f" I1017 10:03:36.957431 1 proxy.go:101] proxy "msg"="received token request" "method"="GET" "uri"="/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fdatabase.windows.net%2F&object_id=h869042n-6f42-5jg5-5g2f-48956645645f"
kubectl version Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"windows/amd64"} Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.15", GitCommit:"1d79bc3bcccfba7466c44cc2055d6e7442e140ea", GitTreeState:"clean", BuildDate:"2022-09-21T21:45:46Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}