MassTransit / MassTransit

Distributed Application Framework for .NET
https://masstransit.io
Apache License 2.0
6.93k stars 1.61k forks source link

Response queue on Azure Service Bus does not get recreated after it has been deleted #2256

Closed sanderaernouts closed 3 years ago

sanderaernouts commented 3 years ago

Is this a bug report?

Yes

Can you also reproduce the problem with the latest version?

Yes

Environment

  1. Dotnet version: ASP.net Core 3.1
  2. Operating system: Windows
  3. IDE (Rider, VS Code, Visual Studio): n/a

Steps to Reproduce

  1. Start the ASP.net core application
  2. Make a request with a RequestClient
  3. All if fine
  4. Wait for the response queue to be deleted
  5. Make a request with a RequestClient
  6. Response queue is not found (and not recreated)

Expected Behavior

Whenever a (responds) queue is deleted automatically it should be recreated again when needed, for example when invoking a RequestClient

Actual Behavior

After some time this queue is deleted, once this happens the queue is not recreated when the RequestClient is invoked which results in this error in our Azure Function:

masstransit put token failed. status-code: 404, status-description: The messaging entity 'sb://<bus>.servicebus.windows.net/<computername>_bus_1yjyyy895cyyg1kkbdcm3n418q' could not be found.

Reproducible Demo

This is how we configured Masstransit in our ASP.net core application

services.AddMassTransit(mt =>
{

    mt.UsingAzureServiceBus((context, cfg) =>
    {
        cfg.Host(connectionStrings.AzureServiceBusConnection);
    });

    mt.AddRequestClient<SomeCommand>(new Uri("queue:<queue>"));
});

services.AddMassTransitHostedService();

And then we invoke it like this:

var (created, rejected) = await client.GetResponse<CreatedEvent, RejectedEvent>(new SomeCommand
            {
                             ......
            }, timeout: TimeSpan.FromMinutes(1));

this works fine until the application has been running for a while, I don't know the exact time. Restarting the application resolves the issue until the queue is deleted again.

gldraphael commented 3 years ago

@sanderaernouts - Are you able to reliably reproduce this?

I've created a little demo you can run with:

git clone --single-branch --branch mt-2256 https://github.com/gldraphael/mt-req-response-exp.git
cd mt-req-response-exp
# TODO: Update the .env file to use Transport="AzureServiceBus", and set the connection string
docker-compose build
docker-compose up

Check the code at ClientApp/Program.cs and make changes as needed.

I'm only able to reproduce this if I manually go in and delete the temporary queue.
The temporary queue should -- if I understand correctly -- only get auto-deleted when the connection is closed.
So what you're seeing is likely a network problem and not a MT bug.

sanderaernouts commented 3 years ago

@gldraphael wow thanks for putting that together, works extremely well for a reproduction scenario! 👍. I can indeed reproduce the issue by manually deleting. However, after I delete it I do see that the temporary queue is recreated after some time. I'm not sure if this is intentionally part of your reproduction scenario or whether MassTransit is recovering from the error?

The temporary queue should -- if I understand correctly -- only get auto-deleted when the connection is closed

Shouldn't that also mean that the temporary queue gets recreated once the connection is opened again? What we were seeing is that the queue got deleted but the ASP.net Core app was still running. The RequestClient kept sending the URI of the deleted temporary queue in the message headers so the recipient tried to reply to a non-existing queue. I did see a temporary queue on our service bus, it was just a different one.

Could this be a caching problem? Meaning that the name of the old temporary queue is kept somewhere even though a new queue is created? Restarting the ASP.net Core application resolved it because a new temporary queue was created.

phatboyg commented 3 years ago

The queue that gets recreated after you delete the temporary queue, does it have the same name? It should, which is why I ask.

sanderaernouts commented 3 years ago

It doesn't but I'm guessing that is because we restarted the App Service in Azure which may mean it ends up on a different host. If I remember correctly the hostname is part of the queue name right?

I just checked our DEV environment and noticed that the temporary queue was gone. Since our application is hosted on App Services (which has IIS underneath I believe) this is probably due to the application process recycling as a result of inactivity. So I triggered the RequestClient and I saw that the queue got created (rd0003ffdb9012_w3wp_bus_1yjyyy895cyy8ay8bdcm79tfrt) which is what I would expect.

When I created this issue there was a temporary queue in our service bus, both in DEV and in PROD, but it was not the queue that was being used by the RequestClient when sending the messages, that temporary queue did not exist (anymore).

gldraphael commented 3 years ago

I'll re-check on Monday... but when I last checked it did recreate the temporary queue...

Galdin Raphael

On Sun, 24 Jan 2021 at 12:16, Sander Aernouts notifications@github.com wrote:

It doesn't but I'm guessing that is because we restarted the App Service in Azure which may mean it ends up on a different host. If I remember correctly the hostname is part of the queue name right?

I just checked our DEV environment and noticed that the temporary queue was gone. Since our application is hosted on App Services (which has IIS underneath I believe) this is probably due to the application process recycling as a result of inactivity. So I triggered the RequestClient and I saw that the queue got created ( rd0003ffdb9012_w3wp_bus_1yjyyy895cyy8ay8bdcm79tfrt) which is what I would expect.

When I created this issue there was a temporary queue in our service bus, both in DEV and in PROD, but it was not the queue that was being used by the RequestClient when sending the messages, that temporary queue did not exist (anymore).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MassTransit/MassTransit/issues/2256#issuecomment-766301497, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSLJZKYOMN6WN2KJLJP6J3S3O637ANCNFSM4WK2TJCA .

phatboyg commented 3 years ago

So you know, the temporary queue is not created until the first request is sent.

gldraphael commented 3 years ago

@phatboyg - The temporary queue does get created. But I'm seeing the bug.

// full code at https://github.com/gldraphael/mt-req-response-exp/blob/8191a6b93864076f1bbc6438e384a8fed5666967/ClientApp/Program.cs
await Run(numberOfRequests: 1);
Log.Information("Manually delete the temporary queue now.");
await Task.Delay(TimeSpan.FromMinutes(1));
Log.Information("Continuing execution...");
await Run(numberOfRequests: 1);
Log.Information("Re-running...");
await Run(numberOfRequests: 1);

Here's the log (without stacktrace):

worker_1  | [14:22:41 INF] Starting the bus...
worker_1  | [14:22:50 INF] {"Id": 1, "Name": "Eduardo Herman", "Phone": "839-671-2932", "Email": "Eduardo.Herman62@yahoo.com", "CountryCode": "GS", "$type": "GetInfoResponse"}
client_1  | [14:22:50 INF] [1]: {"Id": 1, "Name": "Eduardo Herman", "Phone": "839-671-2932", "Email": "Eduardo.Herman62@yahoo.com", "CountryCode": "GS", "$type": "GetInfoResponse"}
client_1  | [14:22:50 INF] Manually delete the temporary queue now.
client_1  | [14:23:07 ERR] Exception on Receiver sb://redacted.servicebus.windows.net/4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi?autodelete=300 during Receive ActiveDispatchCount(0) ErrorRequiresRecycle(True)
client_1  | Microsoft.Azure.ServiceBus.MessagingEntityNotFoundException: Put token failed. status-code: 404, status-description: The messaging entity 'sb://redacted.servicebus.windows.net/4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions.  TrackingId:f209fe44-9de5-40e8-9aa9-956c188f092f_G7, SystemTracker:redacted.servicebus.windows.net:4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi, Timestamp:2021-01-25T14:23:07.
client_1  | [14:23:07 ERR] Connection Failed: sb://redacted.servicebus.windows.net/4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi?autodelete=300
client_1  | MassTransit.Azure.ServiceBus.Core.ServiceBusConnectionException: ReceiveTransport faulted: sb://redacted.servicebus.windows.net/4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi?autodelete=300
client_1  |  ---> System.AggregateException: One or more errors occurred. (One or more errors occurred. (Put token failed. status-code: 404, status-description: The messaging entity 'sb://redacted.servicebus.windows.net/4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions.  TrackingId:f209fe44-9de5-40e8-9aa9-956c188f092f_G7, SystemTracker:redacted.servicebus.windows.net:4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi, Timestamp:2021-01-25T14:23:07.))
client_1  |  ---> System.AggregateException: One or more errors occurred. (Put token failed. status-code: 404, status-description: The messaging entity 'sb://redacted.servicebus.windows.net/4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions.  TrackingId:f209fe44-9de5-40e8-9aa9-956c188f092f_G7, SystemTracker:redacted.servicebus.windows.net:4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi, Timestamp:2021-01-25T14:23:07.)
client_1  |  ---> Microsoft.Azure.ServiceBus.MessagingEntityNotFoundException: Put token failed. status-code: 404, status-description: The messaging entity 'sb://redacted.servicebus.windows.net/4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions.  TrackingId:f209fe44-9de5-40e8-9aa9-956c188f092f_G7, SystemTracker:redacted.servicebus.windows.net:4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi, Timestamp:2021-01-25T14:23:07.
client_1  | [14:23:08 ERR] Exception on Receiver sb://redacted.servicebus.windows.net/4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi?autodelete=300 during Receive ActiveDispatchCount(0) ErrorRequiresRecycle(True)
client_1  | Microsoft.Azure.ServiceBus.MessagingEntityNotFoundException: Put token failed. status-code: 404, status-description: The messaging entity 'sb://redacted.servicebus.windows.net/4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions.  TrackingId:f5163e39-1900-4b51-821b-5073fbb15b23_G7, SystemTracker:redacted.servicebus.windows.net:4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi, Timestamp:2021-01-25T14:23:08.
client_1  | [14:23:50 INF] Continuing execution...
worker_1  | [14:23:52 INF] {"Id": 1, "Name": "Cindy Hayes", "Phone": "670-577-0426", "Email": "Cindy16@yahoo.com", "CountryCode": "LB", "$type": "GetInfoResponse"}
worker_1  | [14:23:56 ERR] R-FAULT sb://redacted.servicebus.windows.net/mt-req-resp-queries 00030000-c0a8-0242-6d4b-08d8c13cd62b CommonLib.MessageContracts.GetInfo WorkerApp.GetInfoConsumer(00:00:02.1159082)
worker_1  | Microsoft.Azure.ServiceBus.MessagingEntityNotFoundException: Put token failed. status-code: 404, status-description: The messaging entity 'sb://redacted.servicebus.windows.net/4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions.  TrackingId:094bb7ba-7eda-4293-9ad8-1ff01ab58344_G6, SystemTracker:redacted.servicebus.windows.net:4b0501f6da79_ClientApp_bus_yyboyygyiybrfcoibdccnxfpbi, Timestamp:2021-01-25T14:23:52.
client_1  | [14:25:50 ERR] Timeout waiting for response, RequestId: 00030000-c0a8-0242-598f-08d8c13cd62b
client_1  | MassTransit.RequestTimeoutException: Timeout waiting for response, RequestId: 00030000-c0a8-0242-598f-08d8c13cd62b
client_1  | [14:25:50 INF] [1]: null
client_1  | [14:25:50 INF] Re-running...
worker_1  | [14:25:52 INF] {"Id": 1, "Name": "Brian Bergnaum", "Phone": "290.235.7183", "Email": "Brian97@hotmail.com", "CountryCode": "BY", "$type": "GetInfoResponse"}
client_1  | [14:27:50 ERR] Timeout waiting for response, RequestId: 00030000-c0a8-0242-2570-08d8c13d1db2
client_1  | MassTransit.RequestTimeoutException: Timeout waiting for response, RequestId: 00030000-c0a8-0242-2570-08d8c13d1db2
client_1  | [14:27:50 INF] [1]: null
  1. After the queue was manually deleted, a temporary queue with the correct name does get created when a message is sent. (After "Continuing execution..." in the log above")
  2. The temporary queue ends up having a Fault<> message. Is this expected?
  3. ~The _error queue ends up having a non-fault message. Are the messages in the temporary queue & the error queue swapped?~

I've attached the full stacktrace and messages below:

mt-req-resp-full-log-01.txt

temp-queue-getinfo-fault.json.txt

mt-req-resp-queries_error.json.txt

sanderaernouts commented 3 years ago

Thanks, @phatboyg! Glad you were able to figure out what the problem was :). And Thanks @gldraphael for helping with the repro of this issue 👍

zliebersbach commented 3 years ago

Hi, I have just hit this issue as well. It seems like MassTransit is trying to access a temporary(?) message queue which no longer exists.

For context, our services have been running on AKS for ~1 month and have been experiencing the bug for a few hours now. After restarting the service, it seems to have mitigated itself.

How did you solve the problem?

roycornelissen commented 2 years ago

@phatboyg this issue is still popping up. At some point, requests start timing out, and we see this message in our logging:

masstransit put token failed. status-code: 404, status-description: The messaging entity 'sb://<bus>.servicebus.windows.net/<computername>_bus_1yjyyy895cyyg1kkbdcm3n418q' could not be found.

As @zliebersbach mentioned last year, restarting the app alleviates the problem for a while, but it pops up again later. Any idea where to start looking for the root cause?

As a counter measure we are planning to remove the use of IRequestClient from our application, but I think it would be good to find the root cause and address it.

zliebersbach commented 2 years ago

@roycornelissen Yes, this bug still exists for us today more than a year later and we have to set a reminder to restart our application at least once a month.... not a great solution!

This issue should be reopened.

phatboyg commented 2 years ago

The queue is deleted by Azure, not MassTransit. If your service is running for that long you could just change the AutoDeleteOnIdle value to something higher like one month.

Azure states in the documentation that the idle time is based upon not having a receiver connected. MassTransit has a receiver connected. So there isn't anything MassTransit can do about it.

Increase the idle timeout and problem should occur less frequently.