Open dougli opened 1 month ago
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @EldertGrootenboer.
Thank you for the feedback @dougli . As you know the library currently is not coroutine safe, so our recommendation to users is to use a lock when accessing the producer like in your repro.
Thanks @dougli for opening this issue and pinpointing the exact line where the bug occurs, I'm currently dealing with the same thing on a production service we're hosting. @kashifkhan any suggestions on how to implement that lock? Thanks in advance.
@salustiana this is my implementation, seems to have resolved our issues.
import asyncio
import uuid
from azure.identity.aio import DefaultAzureCredential
from azure.servicebus.aio import ServiceBusClient
from azure.servicebus import ServiceBusMessage
credential = DefaultAzureCredential()
client = ServiceBusClient(
fully_qualified_namespace="my-servicebus.servicebus.windows.net",
credential=credential,
retry_total=10,
retry_mode="exponential",
)
topic_sender = client.get_topic_sender(topic_name="my-topic")
class MyTopicSender:
def __init__(self, topic_sender):
self.lock = asyncio.Lock()
self.topic_sender = topic_sender
async def send_messages(self, id):
for _ in range(100):
async with self.lock:
service_bus_message = ServiceBusMessage(
f"Hell-World-{id}",
subject=f"received-v1/{str(uuid.uuid4())}",
correlation_id=str(uuid.uuid4()),
message_id=str(uuid.uuid4()),
)
await topic_sender.send_message(service_bus_message)
my_topic_sender = MyTopicSender(topic_sender)
tasks = []
for i in range(10):
tasks.append(my_topic_sender.send_messages(i))
await asyncio.gather(*tasks)
@kashifkhan would appreciate any input if you think it could be improved.
@nickpetzold thanks a lot man, appreciate it.
Describe the bug Connections & sessions to service bus are extremely expensive to set up, taking 0.5~1.5s to initialize and teardown. Reusing the ServiceBusSender object mitigates this, but a race condition in the SDK connection flow causes exceptions:
To Reproduce Steps to reproduce the behavior:
Expected behavior
async
SDKs should be async safe and throw no exceptions.Additional context I've found the smoking gun for this bug. This bug is a race condition near line 222 in _servicebus_sender_async.py. Here's the relevant code:
Even though it seems impossible that
self._handler
would be None on theclient_ready_async
call since the previous line worked, because these are all async functions, there's a chance that other async code can unsetself._handler
elsewhere.The culprit is the if-check right at the top of that code block:
This calls up some superclass which unsets
self._handler
. When we're starting a connection, we have an indeterminate state whereself._running
is False, butself._handler
is True. If another parallel call comes into this code during the indeterminate state, it will disconnect the handler and null it out while the first call is still waiting in the while-loop.