Closed paulomarquesc closed 1 year ago
E1129 15:42:50.607679 1 proxy.go:125] proxy "msg"="failed to get token" "error"="failed to acquire token: unable to resolve an endpoint: server response error:\n Get \"https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/v2.0/.well-known/openid-configuration\": context deadline exceeded"
@paulomarquesc context deadline exceeded
indicates there is a network issue from pod to that endpoint. As this is happening only a single node, it might be a network issue on that node. You can validate connectivity by exec
into the sidecar container and trying to connect to that endpoint.
Hi @aramase, is there any other log I check? Because while inside the sidecar, I can do a wget on login.microsoftonline.com without any issues:
root [ / ]# wget login.microsoftonline.com
--2022-11-29 17:51:04-- http://login.microsoftonline.com/
Resolving login.microsoftonline.com... 40.126.26.132, 20.190.154.136, 20.190.154.138, ...
Connecting to login.microsoftonline.com|40.126.26.132|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://login.microsoftonline.com:443/ [following]
--2022-11-29 17:51:04-- https://login.microsoftonline.com/
Connecting to login.microsoftonline.com|40.126.26.132|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.office.com/login# [following]
--2022-11-29 17:51:05-- https://www.office.com/login
Resolving www.office.com... 13.107.6.156, 2620:1ec:a92::156
Connecting to www.office.com|13.107.6.156|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://login.microsoftonline.com/common/oauth2/v2.0/authorize?client_id=4765445b-32c6-49b0-83e6-1d93765276ca&redirect_uri=https%3A%2F%2Fwww.office.com%2Flandingv2&response_type=code%20id_token&scope=openid%20profile%20https%3A%2F%2Fwww.office.com%2Fv2%2FOfficeHome.All&response_mode=form_post&nonce=638053410651413416.N2QzM2ZkOTctNmM2Yy00M2Q0LWI0Y2UtYmVmNDNlOTg3MDUxOTcxM2QwZjMtYWJkOS00OTA4LTg3MDktMTRmM2Q1MWU2ZGEx&ui_locales=en-US&mkt=en-US&state=EDrpOA6KU_msgBqSvCCZRwfL_63LNJoZ5nBTFbzHFO0tQKcy849tkTP3LEdt8pTs73IeUueAKloKBuCmFdWvbXN3yE2DWdVKE_mwSZth3PluqCR4a8TdqvtsM2EubLXHhj7cifCC39dJLYi0T-Fgprbma6vrMtKMoVjRDUi5VQoWBDmrPWzwGecb9ut5WdHVMH5PpODVrJoaleedsog5bM1e0Sx9_jatJp7wTv5jNnDb5zuYtQJi3PHTbBtx6dwlUgzLqFSPEq4oro4YbC9I7w&x-client-SKU=ID_NETSTANDARD2_0&x-client-ver=6.16.0.0 [following]
--2022-11-29 17:51:05-- https://login.microsoftonline.com/common/oauth2/v2.0/authorize?client_id=4765445b-32c6-49b0-83e6-1d93765276ca&redirect_uri=https%3A%2F%2Fwww.office.com%2Flandingv2&response_type=code%20id_token&scope=openid%20profile%20https%3A%2F%2Fwww.office.com%2Fv2%2FOfficeHome.All&response_mode=form_post&nonce=638053410651413416.N2QzM2ZkOTctNmM2Yy00M2Q0LWI0Y2UtYmVmNDNlOTg3MDUxOTcxM2QwZjMtYWJkOS00OTA4LTg3MDktMTRmM2Q1MWU2ZGEx&ui_locales=en-US&mkt=en-US&state=EDrpOA6KU_msgBqSvCCZRwfL_63LNJoZ5nBTFbzHFO0tQKcy849tkTP3LEdt8pTs73IeUueAKloKBuCmFdWvbXN3yE2DWdVKE_mwSZth3PluqCR4a8TdqvtsM2EubLXHhj7cifCC39dJLYi0T-Fgprbma6vrMtKMoVjRDUi5VQoWBDmrPWzwGecb9ut5WdHVMH5PpODVrJoaleedsog5bM1e0Sx9_jatJp7wTv5jNnDb5zuYtQJi3PHTbBtx6dwlUgzLqFSPEq4oro4YbC9I7w&x-client-SKU=ID_NETSTANDARD2_0&x-client-ver=6.16.0.0
Connecting to login.microsoftonline.com|40.126.26.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 201363 (197K) [text/html]
Saving to: 'index.html'
index.html 100%[======================================================================>] 196.64K --.-KB/s in 0.001s
2022-11-29 17:51:05 (146 MB/s) - 'index.html' saved [201363/201363]
And it also works fine following the whole URI (https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/v2.0/.well-known/openid-configuration):
root [ / ]# wget https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/v2.0/.well-known/openid-configuration
--2022-11-29 17:53:07-- https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/v2.0/.well-known/openid-configuration
Resolving login.microsoftonline.com... 20.190.154.139, 20.190.154.16, 20.190.154.18, ...
Connecting to login.microsoftonline.com|20.190.154.139|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1753 (1.7K) [application/json]
Saving to: 'openid-configuration'
openid-configuration 100%[======================================================================>] 1.71K --.-KB/s in 0s
2022-11-29 17:53:07 (880 MB/s) - 'openid-configuration' saved [1753/1753]
root [ / ]# cat openid-configuration
{"token_endpoint":"https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/v2.0/token","token_endpoint_auth_methods_supported":["client_secret_post","private_key_jwt","client_secret_basic"],"jwks_uri":"https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/discovery/v2.0/keys","response_modes_supported":["query","fragment","form_post"],"subject_types_supported":["pairwise"],"id_token_signing_alg_values_supported":["RS256"],"response_types_supported":["code","id_token","code id_token","id_token token"],"scopes_supported":["openid","profile","email","offline_access"],"issuer":"https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/v2.0","request_uri_parameter_supported":false,"userinfo_endpoint":"https://graph.microsoft.com/oidc/userinfo","authorization_endpoint":"https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/v2.0/authorize","device_authorization_endpoint":"https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/v2.0/devicecode","http_logout_supported":true,"frontchannel_logout_supported":true,"end_session_endpoint":"https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/v2.0/logout","claims_supported":["sub","iss","cloud_instance_name","cloud_instance_host_name","cloud_graph_host_name","msgraph_host","aud","exp","iat","auth_time","acr","nonce","preferred_username","name","tid","ver","at_hash","c_hash","email"],"kerberos_endpoint":"https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/kerberos","tenant_region_scope":"WW","cloud_instance_name":"microsoftonline.com","cloud_graph_host_name":"graph.windows.net","msgraph_host":"graph.microsoft.com","rbac_url":"https://pas.windows.net"}
And now it seems to be working again:
I1129 17:31:13.901033 1 proxy.go:107] proxy "msg"="received token request" "method"="GET" "uri"="/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=9c825a3f-4493-42cb-80ba-746320a5d8e7&resource=https%3A%2F%2Fvault.azure.net"
I1129 17:31:14.532131 1 proxy.go:129] proxy "msg"="successfully acquired token" "method"="GET" "uri"="/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=9c825a3f-4493-42cb-80ba-746320a5d8e7&resource=https%3A%2F%2Fvault.azure.net"
This seems not well stable, I never ran into this type of issue using the deprecated AAD container solution.
Another thing that is puzzling me is that why this solution is using the global AAD endpoint instead of the regional ones (e-stsr)
This seems not well stable, I never ran into this type of issue using the deprecated AAD container solution.
If it's working now, that could mean intermittent network issue in the node. The old AAD Pod Identity project relied on getting tokens from IMDS and not directly from AAD endpoint. IMDS is not used in workload identity federation. Also, the proxy is just a migration solution. Long term, if you use the latest SDK, you don't need to use the proxy sidecar.
Another thing that is puzzling me is that why this solution is using the global AAD endpoint instead of the regional ones (e-stsr)
ests-r currently only supports 1st party apps. There are plans to support it for 3P Apps. Also, that would be done internally in the SDK and not specifically in the proxy. So whenever that's supported in the MSAL SDK with 3P Apps, the proxy will use it.
Thanks @aramase, is there a way I can set a liveness probe (or similar mechanism) in this sidecar so k8s can recycle it if it is having these issues?
@aramase , something is not right with the azwi-proxy, I redeployed a few times, everything was fine, then another sidecar started to act in the same way:
I1201 15:47:23.282305 1 proxy.go:107] proxy "msg"="received token request" "method"="GET" "uri"="/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=9c825a3f-4493-42cb-80ba-746320a5d8e7&resource=https%3A%2F%2Fvault.azure.net"
E1201 15:47:53.284700 1 proxy.go:125] proxy "msg"="failed to get token" "error"="failed to acquire token: unable to resolve an endpoint: server response error:\n Get \"https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/v2.0/.well-known/openid-configuration\": context deadline exceeded"
I'm one of the dev owners of a 1st party Azure service and using AKS with Mariner 2.0 as my nodes, Mariner 2.0 as the image of my app and azwi-proxy cannot just keep failing to reach out to login.microsoft.com to obtain MSI credentials, the team providing this solution needs to go further in the investigation and not simply come with 'your side car is having network issues'.
@paulomarquesc If you have steps to consistently repro this, i'll be happy to take a look. If you have access to an environment where this is happening, please feel free to share access on team.
the team providing this solution needs to go further in the investigation and not simply come with 'your side car is having network issues'.
If the issue is happening intermittently and only on some worker nodes it could be a temporary blip on the node. We're running the tests as part of the CI and nightly runs where we haven't observed similar issues.
As I mentioned in the previous comment, the sidecar is only a temporary solution. If your application uses the latest SDK (https://azure.github.io/azure-workload-identity/docs/topics/language-specific-examples/azure-identity-sdk.html), there is no requirement for the sidecar. By dropping the sidecar proxy usage, your workload will directly make calls to login.microsoft.com
.
To close this thread, I converted the first application based on go using the examples provided in the repo and it worked fine using the webhook instead of the sidecar. Thanks @aramase.
I'm working on adopting Azure Workload Identity and from time to time the container azwi-proxy seems to be unable to obtain tokens, here is the log from the azwi-proxy container on the failing POD (anfgenevaactionssreapiworker-5c66c488c6-4t527):
This is intermittent but when it gets into this state the application running that depends on Managed Identity authentication starts failing due to the error above, also, when this happens it is not on all PODs, you can see the logs from other two (out 3 PODS) azwi-proxy container, where they seem to still be cached:
Here is the output of the get pods and we can see that the one that is trying to obtain the new token is the one that keeps restarting: