Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.63k stars 2.84k forks source link

CBS Token failed on workload identity with DefaultAzureCredential() #31798

Closed jonathan-fileread closed 1 year ago

jonathan-fileread commented 1 year ago

Describe the bug Getting a CBS token failed issue, namely from this error below. We're running a servicebusclient(credential=DefaultAzureCredential(), namespace="namespacehere") through a deployment within AKS, tied to workload identity. 2023-08-24 22:26:37,126 azure.servicebus._pyamqp.cbs INFO CBS Put token error: b'amqp:unauthorized-access'

What I've done so far

Here are the things I have tried to do

locally - code tested locally and works on azclicredential (az login method) managed identity - checked role assignments for managed identity, contain all service bus owner roles network security groups - made sure both sides passed 5671/5672 ports for amqp to each other (service bus and aks are on different subnets) debug client id - checked and saw AZURE_CLIENT_ID and AZURE_TENANT_ID were present in the container, and token in the logs was successfully retrieved for the azure credentials

To Reproduce Steps to reproduce the behavior:

  1. setup workload identity and deploy to AKS with service bus. try to containerize and run this in AKS.
    servicebus_client = ServiceBusClient(
        fully_qualified_namespace="namespacehere.servicebus.windows.net",
        credential=credential,
        logging_enable=True
    )

    with servicebus_client:
        receiver = servicebus_client.get_queue_receiver(queue_name=QUEUE_NAME)
        with receiver:
            received_msgs = receiver.receive_messages(max_message_count=10, max_wait_time=5)
            for msg in received_msgs:
                print(str(msg))
                receiver.complete_message(msg)

Expected behavior should run without error in AKS after deployment is created, processing messages from service bus.

Screenshots If applicable, add screenshots to help explain your problem.

Adding debugging log here:

Here are some of the logs that I have setup using logging debugger + logging=true

2023-08-24 21:20:36,404 azure.identity._credentials.managed_identity INFO ManagedIdentityCredential will use workload identity 2023-08-24 21:20:39,359 azure.identity._internal.get_token_mixin DEBUG WorkloadIdentityCredential.get_token succeeded 2023-08-24 21:20:39,359 azure.identity._credentials.default INFO DefaultAzureCredential acquired a token from WorkloadIdentityCredential 2023-08-24 21:20:39,460 azure.servicebus._pyamqp.cbs DEBUG CBS update in progress. Token put time: 1692912039 2023-08-24 21:20:39,460 azure.servicebus._pyamqp.receiver DEBUG <- TransferFrame(handle=0, delivery_id=0, delivery_tag=b'\x01\x00\x00\x00', message_format=0, settled=None, more=False, rcv_settle_mode=None, state=None, resume=None, aborted=None, batchable=False, payload=b'***') 2023-08-24 21:20:39,460 azure.servicebus._pyamqp.cbs INFO CBS Put token error: b'amqp:unauthorized-access' 2023-08-24 21:20:39,510 azure.servicebus._pyamqp.link DEBUG -> DetachFrame(handle=2, closed=True, error=None) 2023-08-24 21:20:39,510 azure.servicebus._pyamqp.link INFO Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACH_SENT: 4> 2023-08-24 21:20:39,511 azure.servicebus._pyamqp.management_link INFO Management link receiver state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACH_SENT: 4> 2023-08-24 21:20:39,511 azure.servicebus._pyamqp.link DEBUG -> DetachFrame(handle=1, closed=True, error=None) 2023-08-24 21:20:39,511 azure.servicebus._pyamqp.link INFO Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACH_SENT: 4> 2023-08-24 21:20:39,511 azure.servicebus._pyamqp.management_link INFO Management link sender state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACH_SENT: 4> 2023-08-24 21:20:39,511 azure.servicebus._pyamqp.session DEBUG -> EndFrame(error=None) 2023-08-24 21:20:39,511 azure.servicebus._pyamqp.session INFO Session state changed: <SessionState.MAPPED: 3> -> <SessionState.END_SENT: 4> 2023-08-24 21:20:39,511 azure.servicebus._pyamqp._connection DEBUG -> CloseFrame(error=None) 2023-08-24 21:20:39,511 azure.servicebus._pyamqp._connection INFO Connection state changed: <ConnectionState.OPENED: 9> -> <ConnectionState.CLOSE_SENT: 11> 2023-08-24 21:20:39,511 azure.servicebus._pyamqp._connection INFO Connection state changed: <ConnectionState.CLOSE_SENT: 11> -> <ConnectionState.END: 13> 2023-08-24 21:20:39,511 azure.servicebus._pyamqp.session INFO Session state changed: <SessionState.END_SENT: 4> -> <SessionState.DISCARDING: 6> 2023-08-24 21:20:39,511 azure.servicebus._pyamqp.link INFO Link state changed: <LinkState.DETACH_SENT: 4> -> <LinkState.DETACHED: 0> 2023-08-24 21:20:39,511 azure.servicebus._pyamqp.management_link INFO Management link sender state changed: <LinkState.DETACH_SENT: 4> -> <LinkState.DETACHED: 0> 2023-08-24 21:20:39,511 azure.servicebus._pyamqp.link INFO Link state changed: <LinkState.DETACH_SENT: 4> -> <LinkState.DETACHED: 0> 2023-08-24 21:20:39,512 azure.servicebus._pyamqp.management_link INFO Management link receiver state changed: <LinkState.DETACH_SENT: 4> -> <LinkState.DETACHED: 0> 2023-08-24 21:20:40,106 azure.servicebus._base_handler INFO AMQP error occurred: (TokenAuthFailure('CBS Token authentication failed.\nStatus code: None')), condition: (<ErrorCondition.ClientError: b'amqp:client-error'>), description: (None).

Additional context Add any other context about the problem here.

kristapratico commented 1 year ago

@jonathan-fileread thanks for your issue, we'll take a look and get back to you as soon as possible. I'm tagging this as service bus since it looks like the get_token call succeeds and the auth fails in pyamqp, but @kashifkhan @swathipil @l0lawrence please re-triage if I misinterpreted the logs.

jonathan-fileread commented 1 year ago

Some additional files to provide info

serviceaccount.yaml

kind: ServiceAccount
metadata:
  annotations:
    azure.workload.identity/client-id: <id here>
  name: workload-identity-sa
  namespace: keda
  labels:
    azure.workload.identity/use: "true"
kashifkhan commented 1 year ago

@jonathan-fileread Thank you for the detailed information and repro steps :)

I would like to try out one thing to help narrow things down a bit:

Our Event Hub client, which uses the same AMQP library as Service Bus, has internal customers using Workload Identity properly.

jonathan-fileread commented 1 year ago

Thanks @kashifkhan for getting back so quickly.

Same issue it seems - however switching to uamqp_transport=True revealed that it is an IP issue. Looks like more logs were generated to aid in my discovery

Azure.servicebus._base_handler INFO 'servicebus.pysdk-44fd747f' operation has exhausted retry. Last exception: ServiceBusAuthenticationError('CBS Token authentication failed.\nStatus code: 401\nDescription: Ip has been prevented to connect to the endpoint.\r\n For more information see:\r\n Virtual Network service endpoints:\r\n Event Hubs: https://go.microsoft.com/fwlink/?linkid=2044192\r\n Service Bus: https://go.microsoft.com/fwlink/?linkid=2044235\r\n IP Filters:\r\n Event Hubs: https://go.microsoft.com/fwlink/?linkid=2044428\r\n Service Bus: https://go.microsoft.com/fwlink/?linkid=2044183\r\n TrackingId:8c591eaf-33fe-456c-802e-d434d250d531_G7

We're currently working with a private AKS cluster and private servicebus with private endpoints. NSGs allow amqp ports on either side (they're on different subnets)

I'll investigate further. Thank you!

kashifkhan commented 1 year ago

Thanks for the info and log print out @jonathan-fileread . I want to improve the logging output of the python AMQP so in the future we dont have to switch to uamqp :)

Fingers crossed its an IP issue and we can get yall back on the python amqp library instead of uamqp

github-actions[bot] commented 1 year ago

Hi @jonathan-fileread. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

jonathan-fileread commented 1 year ago

@kashifkhan I think we can close this issue - as the issue is something related to connectivity between private AKS to private servicebus. resolvers are not resolving to the correct private endpoint IPs - do you know which github repo I could make the issue for?

kashifkhan commented 1 year ago

@jonathan-fileread there are 2 ways to reach out to them