Azure / azure-service-bus

☁️ Azure Service Bus service issue tracking and samples
https://azure.microsoft.com/services/service-bus
MIT License
580 stars 775 forks source link

Session lock does not expire if SessionReceiver's PrefetchSize is larger than the number of messages in the session #694

Open michaelmcmaster opened 5 months ago

michaelmcmaster commented 5 months ago

Description

If the Service Bus receiver uses a PrefetchSize that is larger than the number of messages in the Service Bus session, the session lock seems to be automatically (and indefinitely) renewed until the client connection is closed... and allows a malfunctioning client to "hang" a session indefinitely.

Related Observations

I cannot determine if the session lock is being automatically renewed by the client or by something server-side. I don't see any activity in the client logs that indicates the client is (automatically) renewing the session lock. The session lock is released (on the server) if the connection between the client and the server is severed.

This issue appears to be related to the AMQP transport. Running a similar test using the older WindowsAzure.ServiceBus client using SBMP transport works as expected (ie. session lock is lost regardless of number of messages), but switching the transport to AMQP behaves identical to the Azure.Messaging.ServiceBus client (ie. session lock doesn't expire, as outlined in the issue).

Recreate

I posted a Visual Studio 2022 solution (console application) that recreates the issue to GitHub. Informational logs are written to the console, while trace logs are written to a file in the working directory.

This application can be pointed to a ServiceBus, and it will:

ISSUE: In this ^^^ scenario, the CompleteMessage should always result in a SessionLockLost exception, but if the PrefetchSize is larger than the number of messages in the session, the session lock is never lost (remains locked indefinitely) and the message(s) are successfully completed (removed from the queue).

Command Line Options

-c, --connection    Required. Service Bus connection string (Manage, Send, Listen)
-m, --messages      (Default: 1) Number of messages to put into Service Bus (single session)
-p, --prefetch      (Default: 2) Service Bus receive prefetch size
-q, --queue         (Default: session_lock_failure) Service Bus queue name

Scenario 1 (OK) : messages >= prefetch

With this scenario, the Service Bus behaves according to official documentation. During the delay, the server-side expires the session lock and a SessionLockLost exception is thrown when the client-side attempts (after the delay) to complete the messages.

Command Line: SessionLockFailure.exe -c "******" -m 2 -p 2

2024-01-26T15:34:45.8599208-06:00 [INF] [1] SessionLockFailure running
2024-01-26T15:34:47.8922086-06:00 [INF] [10] ServiceBus client connected
2024-01-26T15:34:47.9612183-06:00 [INF] [10] Sending partial batch: [1]
2024-01-26T15:34:48.4635329-06:00 [INF] [5] Sent [2] messages in [0.57] seconds (3.49 msg/s).
2024-01-26T15:34:48.5755560-06:00 [INF] [10] AcceptNextSession: SessionId:[0], LockedUntil:[2024-01-26T15:35:03.5221389-06:00]
2024-01-26T15:34:48.6002566-06:00 [INF] [7] Delay:[00:00:14.9220584] to allow session lock to expire
2024-01-26T15:35:03.5301752-06:00 [INF] [7] Delay:[00:05:00] for extra measure
2024-01-26T15:40:03.5212820-06:00 [INF] [34] CompleteMessage: SessionId:[0], SequenceNumber:[1]
2024-01-26T15:40:03.5328858-06:00 [WRN] [34] CompleteMessage: Session lock lost (expected)
Azure.Messaging.ServiceBus.ServiceBusException: The session lock has expired on the MessageSession. Accept a new MessageSession. TrackingId:*****, SystemTracker:***:***:amqps://******/***;0:7:8:source(address:/session_lock_failure,filter:[com.microsoft:session-filter:]), Timestamp:2024-01-26T21:35:03 (SessionLockLost).

Scenario 2 (Failure) : messages < prefetch

With this scenario, the Service Bus misbehaves (session lock is held indefinitely). During the delay, the server-side does not expire the session lock. The (server-side) session lock is being indefinitely maintained by the client connection... causing the session to be indefinitely stalled until the client connection is terminated. This can be further confirmed by attempting an AcquireNextSession + Receive (ex. from ServiceBusExplorer) during the delay period. The messages are successfully completed when the client-side attempts (after the delay) to complete the messages... but they shouldn't be, as the session lock should have been lost.

Command Line: SessionLockFailure.exe -c "*****" -m 1 -p 2

2024-01-26T14:48:22.8077862-06:00 [INF] [1] SessionLockFailure running
2024-01-26T14:48:25.0585689-06:00 [INF] [10] ServiceBus client connected
2024-01-26T14:48:25.1456060-06:00 [INF] [10] Sending partial batch: [1]
2024-01-26T14:48:25.7762166-06:00 [INF] [10] Sent [1] messages in [0.72] seconds (1.39 msg/s).
2024-01-26T14:48:25.8877994-06:00 [INF] [10] AcceptNextSession: SessionId:[0], LockedUntil:[2024-01-26T14:48:40.7684016-06:00]
2024-01-26T14:48:25.9198234-06:00 [INF] [10] Delay:[00:00:14.8487970] to allow session lock to expire
2024-01-26T14:48:40.7726083-06:00 [INF] [10] Delay:[00:30:00] for extra measure
2024-01-26T15:18:40.7755738-06:00 [INF] [137] CompleteMessage: SessionId:[0], SequenceNumber:[1]
2024-01-26T15:18:41.9229045-06:00 [ERR] [138] FAILURE: The sesion lock *should* have been lost, but was not
EldertGrootenboer commented 4 months ago

Thank you for your feedback. We have opened an investigation task for this in our backlog, and will update this issue when we have more information.

EldertGrootenboer commented 2 months ago

This item in our backlog, however we currently don't have an ETA on when development might start on this. For now, to help us give this the right priority, it would be helpful to see others vote and support this item.