Open deadwards90 opened 5 years ago
Would you be able to share details from the error messages you receive? Even something like this with the TrackingId and Timestamp would be awesome:
The request was routed to the listener but the listener did not respond in the required time. TrackingId:abf5350a-c43b-43db-b12f-8c905b826c5e_G1, SystemTracker:{REMOVE_THIS_IF_YOU_WANT}, Timestamp:2019-09-23T17:15:39
Or you can email me directly: dstucki (at) microsoft.com
There are some internal issues that are currently being ironed out which unfortunately lead to errors like "The request was routed to the listener but the listener did not respond" even though the listener did indeed respond. We're actively working on getting those fixed. If the HTTP request or response is > 64KB then this issue would not be seen.
Thanks for getting back to me.
Where can I find the Tracking Id? I'm using App Insights and an Azure function to do the communication with the listeners.
Well, there have been a few "known" issues happening in the last day or so. That was going to be one way to check if you were in one of the locations where those had been occurring without asking you for your X.servicebus.windows.net namespace. I could look it up given a TrackingId like that, given your namespace, or you could run nslookup X.servicebus.windows.net
(replacing "X" with your namespace) and tell me the *.cloudapp.net dns name.
e.g. nslookup dstucki-relay-usw.servicebus.windows.net
gives ns-sb2-prod-by3-003.cloudapp.net.
I'm getting ns-sb2-prod-ln1-001.cloudapp.net
Was this ever resolved?
@wmmihaa Not to my knowledge, but I have since left the place I was working so I don't know if they ever found a workaround.
Hey Guys,
it looks like we observe the same/ similar issue here. Message source is a Azure EventGrid however.
Even if we strip down the handler to the very basic:
private void RequestHandler(RelayedHttpListenerContext context) { try { var content = new StreamReader(context.Request.InputStream).ReadToEnd(); _logger.LogInformation(content); context.Response.StatusCode = HttpStatusCode.OK; context.Response.Close(); } catch (Exception e) { _logger.LogError(e, "Error on EventGridHandler RequestHandler"); } }
we seeing successful delivery attempts for around 1-2 minutes. Thereafter we observing a raising handle and thread count (at process level) followed by event TaskCanceledExceptions.
System.Threading.Tasks.TaskCanceledException: A task was canceled. at Microsoft.Azure.Relay.HybridConnectionListener.ControlConnection.SendCommandAndStreamAsync(ListenerCommand command, Stream stream, CancellationToken cancelToken) at Microsoft.Azure.Relay.HybridHttpConnection.SendResponseAsync(ResponseCommand responseCommand, Stream responseBodyStream, CancellationToken cancelToken) at Microsoft.Azure.Relay.HybridHttpConnection.ResponseStream.CloseAsync() at Microsoft.Azure.Relay.HybridHttpConnection.ResponseStream.CloseAsync() at Microsoft.Azure.Relay.RelayedHttpListenerResponse.CloseAsync() at Microsoft.Azure.Relay.RelayedHttpListenerResponse.Close() at DisturbanceAnnotation.EventGrid.EventGridHandler.RequestHandler(RelayedHttpListenerContext context)
from time to time the "Connecting" Handler is called in this moment the "LastedError" Property contains:
Microsoft.Azure.Relay.RelayException: The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseSent'. TrackingId:7786bf4f-800d-4f6f-afb9-01b7710f2767, Address:sb://"not for public :) ", Timestamp:9/3/2021 8:52:21 PM ---> System.Net.WebSockets.WebSocketException (0x80004005): The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseSent' at System.Net.WebSockets.WebSocketValidate.ThrowIfInvalidState(WebSocketState currentState, Boolean isDisposed, WebSocketState[] validStates) at System.Net.WebSockets.ManagedWebSocket.ReceiveAsync(ArraySegment`1 buffer, CancellationToken cancellationToken) --- End of stack trace from previous location --- at Microsoft.Azure.Relay.HybridConnectionListener.ControlConnection.ReceivePumpCoreAsync() --- End of inner exception stack trace ---
In AzureDiagnostics following exceptions can be found
ResourceProvider: MICROSOFT.RELAY OperationName: Microsoft.Relay/HybridConnections/ListenerReturningHttpResponseFailed Message:_ Returning HTTP response with failure response code: NotFound due to: EndpointNotFoundException: There are no listeners connected for the endpoint. TrackingId:9989a2b8-47f1-4871-9b56-5c006a6f4ef1_G5, SystemTracker:"not for public :) ", Timestamp:2021-09-04T08:49:05
However we still get new "RequestHandler" callback invocations. I think the problem might be related to the non blocking Task.Run in "Microsoft.Azure.Relay.HybridHttpConnection" -> CreateAsync() that might lead to a thread pool exhaustion if the "CloseAsync()" calls takes longer than expected
public static async Task CreateAsync(HybridConnectionListener listener, ListenerCommand.RequestCommand requestCommand, WebSocket controlWebSocket) { var hybridHttpConnection = new HybridHttpConnection(listener, controlWebSocket, requestCommand.Address);
// In this method we're holding up the listener's control connection. // Do only what we need to do (receive any request body from control channel) and then let this Task complete. bool requestOverControlConnection = requestCommand.Body.HasValue; var requestAndStream = new RequestCommandAndStream(requestCommand, null); if (requestOverControlConnection) { requestAndStream = await hybridHttpConnection.ReceiveRequestBodyOverControlAsync(requestCommand).ConfigureAwait(false); } // ProcessFirstRequestAsync runs without blocking the listener control connection: Task.Run(() => hybridHttpConnection.ProcessFirstRequestAsync(requestAndStream)).Fork(hybridHttpConnection); }
Hi @TorstenL, sorry it's taken so long to respond. Can you update the RequestHandler to do the following?
// Note: When using 'async void' on a method it's crucial to not allow unhandled exceptions
// (addressed by the catch block at the end)
async void RequestHandler(RelayedHttpListenerContext context)
{
try {
using (var streamReader = new StreamReader(context.Request.InputStream)) { // Item 2
string content = await streamReader.ReadToEndAsync().ConfigureAwait(false); // Item 3, Item 5
_logger.LogInformation(content); // Item 4 (review)
}
context.Response.StatusCode = HttpStatusCode.OK;
await context.Response.CloseAsync().ConfigureAwait(false); // Item 1, Item 5
}
catch (Exception e) {
_logger.LogError(e, "Error on EventGridHandler RequestHandler");
}
}
Try with these modifications and let me know if anything changes or the problem persists.
Hi @dlstucki, thanks for your answer. Nice to see that this library hasn't be abandoned. :-)
Well i would love to use async processing in the requesthandler.
But as the HybridConnectionListeners RequestHandler callback handler signature is Action<RelayedHttpListenerContext> RequestHandler
- i would not expect that i have to use a new not awaited task. :-)
Please let me quickly sketch out our use case. We’re using the HybridConnectionListener as “Proxy” between our webapp and a azure eventgrid. Naturally the EventGrid subscription stays active the whole time as it statically configured in the azure portal. This might however result in a situation where thousands of small undelivered messages are piled up in the subscription queue if the webapp is not available for some time. Which was the case as we observed this problem.
Please correct me if i’m wrong, but if I look in the HybridConnectionListener implementation it looks to me, that for every single message a new not awaited task is created.
Wouldn't this lead to either timeouts if it takes too long to process a message as of a tremendous amount of parallel running tasks or thread pool starvation if synchronous calls are used ?
@TorstenL, The thought behind HybridConnectionListener.RequestHandler being merely an Action
The HybridConnectionListener SDK also uses the .NET ThreadPool worker threads for it's various internal async operations. If anyone is taking too long to process items or making synchronous calls which take time on the .NET ThreadPool worker threads then Relay/HybridConnectionListener will also be affected by threadpool starvation. Also worth noting, this library had supported UWP which didn't even allow use of the System.Threading.ThreadPool class (See ActionItem.cs).
Please correct me if i’m wrong, but if I look in the HybridConnectionListener implementation it looks to me, that for every single message a new not awaited task is created.
What you describe sounds accurate. We use Task.Run to queue a workitem to the .NET ThreadPool for invoking the user's RequestHandler so that the HybridConnectionListener's "control connection" can start looking for the next message from the cloud service in parallel. Is your concern that there's no way for a listener implementation to introduce back-pressure leading to the listener processing code slowing down? I suppose throttling like that might be a missing feature. Something like "MaxPendingRequests"?
@dlstucki Yes that makes sense. And yes, something like "MaxPendingRequests" would be exactly the missing feature which would be very helpful in our (HybridConnection<->EventGrid) usecase.
I have a .NET Core Windows Service that connects to two Hybrid Connections with code similar to the following:
Azure Relay Listener - This does the actual handling of the request. https://gist.github.com/deadwards90/5e5b9972ac87af1b48e9c9873e6d50e7
Worker - Implementation of the HostedService, starts a collection of AzureRelayListeners (in my case two listeners pointing at different connections) https://gist.github.com/deadwards90/cdcaffcbeada6045710e9cf4d3674c42
DI - Adds mulitple AzureRelayListeners, picked up by the Worker. The RequestHandler is a Singleton but I'm not sure that would make a lot of difference. https://gist.github.com/deadwards90/7ff54966fe24c98973ce9fc91e5f71e8
On occasion, sometimes a connection appears to get into a strange state where it will receive messages and respond in the same way it usually does but the requesting application never receives the response.
So what I can see is a log saying that the operation has been run on the box, no errors or other information like a normal request but then the request application just times out waiting for the response.
This seems to have been exacerbated (or it's a coincidence) by having the application connect to two Hybrid Connections.
Expected Behavior
All responses are received by the calling application when the Stream is disposed.
Actual Behavior
After some time, responses on a connection fail to return to the calling application after the Stream has been disposed of
Versions