dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.19k stars 9.93k forks source link

NodeServices - Can we add a lock to HttpNodeInstance? #14255

Closed dogfoodsilo4 closed 4 years ago

dogfoodsilo4 commented 4 years ago

This is a request for some reassurance around a big problem and our proposed solution...

We have a slightly unusual use case for NodeServices. We migrated a legacy express.js application so that we can call the javascript code from our .Net core application using NodeServices. We couldn't continue to support the size of data structures our business model requires (200MB+ JSON) on express, due to memory limits and half our application was in .Net land already. (Please don't say what we are doing is mad. It's actually a great solution and we cannot change the business requirements.)

Up til recently NodeServices has worked perfectly, we can support those large data structures and the memory issues we were having are no more.

However, we have found that when making hundreds of concurrent calls to js via NodeServices we get an error raised from inside the NodeServices.dll

2019-09-13T04:56:09.9514063-04:00 0HLPO506TM9HR:00000005 [ERR] An error occurred while sending the request. (37785cd4)
System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.IO.IOException: The server returned an invalid or unrecognized response.
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DiagnosticsHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Microsoft.AspNetCore.NodeServices.HostingModels.HttpNodeInstance.InvokeExportAsync[T](NodeInvocationInfo invocationInfo, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.NodeServices.HostingModels.OutOfProcessNodeInstance.InvokeExportAsync[T](CancellationToken cancellationToken, String moduleName, String exportNam

This only occurs randomly when making around 50+ requests per second.

I'd post some code to replicate this, but unfortunately all our test code works ok. Outside of calls via IIS, we can't seem to load the test code enough to force the error.

We spent many days debugging the code and it is clear that the error occurs in the call to the node httpserver (created by entrypoint-http.js) from HttpNodeInstance.cs (line 68)

response = _client.PostAsync(_endpoint, payload, cancellationToken).Result;

Our theory is that duplicate calls are being somehow merged (due to a concurrency issue?) and the response cannot be parsed, hence the error.

This has to be a threading issue. We have no way to identify such an exception and retry the specific request. This also only happens in a swarm of requests so tracing calls through the code is tricky. We added debug logging to the httpNodeInstance, but we cannot pass in a request-id context to know which logs belong to which request.

Having tried lots of ideas, the only solution we have is to add a lock around the _client.PostAsync call:

    static readonly object _locker = new object();  // class level var

    HttpResponseMessage response;
    lock (_locker)
    {
        response = _client.PostAsync(_endpoint, payload, cancellationToken).Result;
    }

This resolves our issue!

We tested performance and we see no degradation, we are happy to implement this fix.

I am simply wondering if anyone would object to adding a lock in the mainline code? We've forked the code, but I don't really want to support a separate repo just for us. I'd post some code to replicate this, but unfortunately all our test code works ok. Outside of calls via IIS, we can't seem to load the test code enough to force the error. We want to stay up to date with the mainline code.

If anyone has any other suggestions, please let me know.

Thanks

SteveSandersonMS commented 4 years ago

I'm not sure where you're finding this line of code that you posted:

response = _client.PostAsync(_endpoint, payload, cancellationToken).Result;

That line worries me because it's blocking the thread. But there is no such line in the source in this repo. What looks like the closest equivalent in HttpNodeInstance.cs line 70 doesn't look like that. It doesn't evaluate .Result, so is properly asynchronous as it should be.

As for your original question, please note that as was announced here, NodeServices is being deprecated. So, I'm afraid we wouldn't take a PR that changes its implementation. You are however welcome to fork the relevant NodeServices code from this repo and change it however you want in your own application.

dogfoodsilo4 commented 4 years ago

Thanks for replying so quickly @SteveSandersonMS.

You are right (I copied the wrong code sorry), the line we have the issue with is:

var response = await _client.PostAsync(_endpoint, payload, cancellationToken);

and our issue is apparent in this original code.

Can you suggest an alternative to our proposed code to help resolve the issue? I know it's cheeky to ask but I cannot find an alternative solution, despite nearly 2 weeks trying to resolve this.

I was not aware of the deprecation of NodeServices. I've done some research and there doesn't appear to be a supported migration path for anyone using NodeServices to call JS serverside.

I guess we'll have to fork NodeServices.

Thanks again.

dogfoodsilo4 commented 4 years ago

Interestingly, one of our QA teams has just (this minute) replicated the issue with our updated code (including the lock) at a much higher load.

So it seems adding the lock doesn't resolve the issue, it just slows?/sorts? the processing enough to force the problem further down the line.

SteveSandersonMS commented 4 years ago

I'm afraid I don't have any further insight into your issue. If it's still reproducible with the lock, then that suggests the problem is actually somewhere else, since the lock takes care of reducing the load (on this line of code) to a single request anyway.

dogfoodsilo4 commented 4 years ago

I have a clear path forward now.

The issue has to lie in the nodejs V8 engine (tested with v10 & 12) as I've replicated this issue with a js file that simply returns a null value.

I'll explore elsewhere.

Cheers for your help @SteveSandersonMS