Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.53k stars 2.76k forks source link

Intermittent failure to get token under azure-core-1.30.2 #36276

Closed theringe closed 2 weeks ago

theringe commented 2 months ago

Describe the bug Issue Definition: My function app uses managed identity to communicate with eventhub, and it was running smoothly initially. However, errors occurred at several previous points in time, causing a large backlog of tasks for writing to eventhub. Although restarting the app allows it to resume normal operation but it is not a good choice.

Error Messages: AppServiceCredential.get_token failed: HTTP transport has already been closed. You may check if youre calling a function outside of the async with of your client creation, or if you called await close() on your client already.

Investigation: I've located the error message from Aplication Insights. image

And through the error message, it was found that this error originates from a fix used in azure-core-1.30.2 More explicit error message if transport is already closed by lmazuel · Pull Request #35559 · Azure/azure-sdk-for-python (github.com) image

To Reproduce Steps to reproduce the behavior:

  1. My code is written almost entirely according to Microsoft's official documentation Send or receive events from Azure Event Hubs using Python - Azure Event Hubs | Microsoft Learn.
  2. Deploy such the code under azure-core-1.30.2 and all the related sdks to Azure Funtion App.
  3. Intermittent issue might raise up.

Expected behavior I have also verified that this invocation behavior occurs approximately once per minute in my app. Therefore, I don't understand why this error is occurring. It should not be happen.

Screenshots Please refer to what I've mentioned above.

Additional context Please refer to what I've mentioned above.

kristapratico commented 2 months ago

Thanks for your issue @theringe, the team will take a look and get back to you as soon as possible. Any chance you could share the example code which reproduces this?

github-actions[bot] commented 2 months ago

Hi @theringe. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

theringe commented 2 months ago

Thanks for your issue @theringe, the team will take a look and get back to you as soon as possible. Any chance you could share the example code which reproduces this?

Hello @kashifkhan @kristapratico Thank you sir,

This is our code snippet (removing the sensitive information) image

At the same time, I want to apologize. Previously, I suspected that the issue was with version 1.30.2 and therefore reported this bug. However, after downgrading to version 1.30.1, my Function App is still experiencing the same symptom, with the only difference being a change in the error message.

If you are willing, I hope you could kindly assist me in troubleshooting this issue. Thank you very much :)

PS: I could find a related post here: https://github.com/Azure/azure-sdk-for-python/pull/30836 which might identify the cause formit.

Error Message: EHProducer-7e1db23f-5bb4-43bc-9324-e16d5d7c670b has an exception (EventHubError('NoneType object has no attribute aenter\nNoneType object has no attribute aenter')). Retrying... An error occurred when detaching the link: AMQPConnectionError(Error condition: ErrorCondition.InternalError\n Error Description: Link already closed.) Management link sender state changed: <LinkState.DETACH_SENT: 4> -> <LinkState.DETACHED: 0> Link state changed: <LinkState.DETACH_SENT: 4> -> <LinkState.DETACHED: 0> An error occurred when detaching the link: AMQPConnectionError(Error condition: ErrorCondition.InternalError\n Error Description: Link already closed.) Management link sender state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACH_SENT: 4> Link state changed: <LinkState.DETACHED: 0> -> <LinkState.DETACHED: 0> Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACH_SENT: 4> Management link sender state changed: <LinkState.DETACHED: 0> -> <LinkState.DETACHED: 0> Management link receiver state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACH_SENT: 4> Connection state changed: <ConnectionState.HDR_SENT: 2> -> <ConnectionState.OPEN_PIPE: 4> Link state changed: <LinkState.ATTACHED: 3> -> <LinkState.DETACH_SENT: 4> Link state changed: <LinkState.DETACHED: 0> -> <LinkState.DETACHED: 0> AppServiceCredential.get_token failed: NoneType object has no attribute aenter "Request URL: http://169.254.129.4:8081/msi/token?api-version=REDACTED&resource=REDACTED Request method: GET Request headers: X-IDENTITY-HEADER: REDACTED User-Agent: azsdk-python-identity/1.16.1 Python/3.10.14 (Linux-5.15.153.1-2.cm2-x86_64-with-glibc2.31) No body was attached to the request"

kashifkhan commented 2 months ago

@theringe which version of the eventhub library are you using? I would advice you to upgrade to the latest one ( v 5.12.1 ) and see if the issue persists.

The other pattern that usually causes this if there is concurrent access to the producer from multiple coroutines. To handle that situation it is advised to use an async lock around the producer to ensure that only one coroutine is access it.

theringe commented 2 months ago

@kashifkhan Hello sir,

1) we are using eventhub-5.11.2

2) I’ve download the whole sdk here In “identity\azure-identity\README.md” image

It mentions that when I use an asynchronous credential, I need to manually close the connection. The official sample code also indicates that the related credential is declared within an asynchronous code block. However, the credential in my code is not declared within an asynchronous code block, which differs from the actual usage method. azure-sdk-for-python/sdk/eventhub/azure-eventhub/samples/async_samples/client_identity_authentication_async.py at azure-eventhub_5.12.1 · Azure/azure-sdk-for-python (github.com) image

Therefore, I suspect this could be the potential cause of the issue. Could you please help me to check for that?

Thanks.

lmazuel commented 1 month ago

Hi @theringe Not sure this is enough to fix your problem, but there is a problem with your code. Given you create the credential before the producer, you need to close the producer before the credential. This code:

credential = DefaultAzureCredential()
async with producer:

   # Do something with producer

   await credential.close()

will close the credential before the producer, since you didn't reach that exit yet. It's possible that some tasks are still processing, and that by closing the credential, you prevent them from finishing, and this would have for effect the exact message you have (credential is closed while trying to use it).

I would first rewrite like this, and see what happens:

credential = DefaultAzureCredential()
async with producer:

   # Do something with producer

# Keep this line outside of the producer context manager
await credential.close()
github-actions[bot] commented 1 month ago

Hi @theringe. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] commented 3 weeks ago

Hi @theringe, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

theringe commented 3 weeks ago

Hello sir, @lmazuel Thank you so much to pinpoint that, I could fixed that after the code level change.

and sorry for the late reply

github-actions[bot] commented 3 weeks ago

Hi @theringe. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

github-actions[bot] commented 2 weeks ago

Hi @theringe, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.