aspnet / KestrelHttpServer

[Archived] A cross platform web server for ASP.NET Core. Project moved to https://github.com/aspnet/AspNetCore
Apache License 2.0
2.63k stars 528 forks source link

Server randomly stops responding mainly on high traffic #2104

Closed ElyseeOkry closed 6 years ago

ElyseeOkry commented 6 years ago

We are using KestrelHttpServer and it randomly stops responding. This can happen few time a day or sometime once every 2-3 days. it is really random behavior. The frequency is high when we have high traffic. I wrote a small monitoring application, which automatically takes a dump when it stops working and restart it. The dumps are taken using dotTrace of Jetbrain(https://www.jetbrains.com/profiler/). Have a look at the picture dottrace

It seems that we have random deadlock in libuv threads. dotTrace shows which method the lock is coming from. private void ThreadStart(object parameter) from class LibuvThread of https://github.com/aspnet/KestrelHttpServer/blob/6584a8b5fdaba0a79c52b981cac88472bcc92d1a/src/Kestrel.Transport.Libuv/Internal/LibuvThread.cs

I am not sure what the issue is but we are having a deadlock somewhere there, probably related to the lock _startSync

Please could you help? Many thanks

vinh84 commented 6 years ago

the same 👍, stop, hangs, CPU 1-2%

halter73 commented 6 years ago

As an application developer, when looking at ASP.NET Core profiles, I would completely ignore any time spent in the LibuvThreads.

The LibuvThread class manages Kestrel's IO threads. There are N LibuvThreads each of which run one managed Thread for the lifetime of the server where N is half the number of logical CPU cores.

There is no issue related to _startSync. The calltree in your profile screenshot shows ~100% of the time in ThreadStart is spent in a call to LibuvFunctions.run which is expected.

gotoxu commented 6 years ago

We are also facing the same problem. When the traffic is high, the kestrel server will randomly lose the response and will not recover

ElyseeOkry commented 6 years ago

This is really annoying as it blocks the usage of asp.net core in applications where transactions is required. Imagine the situation where credit card payment is in progress and we have deadlock:-(.

I found this issue and I think it is related and should be reopened. https://github.com/aspnet/KestrelHttpServer/issues/1278

I also find on the web issues related to libuv pipe ( open, read) can cause deadlock.

Regards

halter73 commented 6 years ago

I completely understand that deadlocks can be way more than just annoying. The problem is that there is no evidence indicating that whatever is causing this hang is Kestrel-related. The screenshot in the original issue is 100% normal and expected. For all I know, there could be a bug in the web application causing hangs.

Can you provide a simple repro app that demonstrates the hangs? If you want a simple way to drive load, I recommend wrk.

ElyseeOkry commented 6 years ago

I disagree with you about the interpretation of the dump.

I am desperate. Is there a way to get low level logs??? Not mvc logs but before that.

benaadams commented 6 years ago

Aside: Libuv threads spend their time waiting for input from the network but in native code; which is why they show up high in a .NET profiler; to the profiler its a function that was called that has not yet returned.

If you are using Windows Server then when it is in a "hung" state you can right click on the process in taskmanger then choose "Create Dump File"; which can be used with VS (if it isn't too big) or WinDbg with SOS.

Do you make blocking calls on the threadpool in your apps code; rather than async calls? e.g. Read rather than ReadAsync or Write vs WriteAsync.

Do you use Task.Wait,.Result,Thread.Sleep,GetAwaiter.GetResult() rather than Task.When, await, Task.Delay etc

A quick test to see if you are starving the threadpool with blocking calls (other than looking at the dmp file), it to increase the restrictions on threadpool growth and see if that elevates some of the issues

// Add at Main()
ThreadPool.SetMinThreads(1024, 1024);

This isn't a solution as it causes a host of other issues and only delays the problem, but it may help with investigation

ElyseeOkry commented 6 years ago

hi Adam,

Aside: Libuv threads spend their time waiting for input from the network but in native code; which is why they show up high in a .NET profiler; to the profiler its a function that was called that has not yet returned

that is exactly the point; I wrote this happens in high traffic so these libuv threads shouldn't be waiting for input. they have input and a lot of them. but if they are in waiting state although of having a lot of input, or they are called functions that has not yet returned, it means we have a kind of block somewhere.

If you are using Windows Server then when it is in a "hung" state you can right click on the process in taskmanger then choose "Create Dump File"; which can be used with VS (if it isn't too big) or WinDbg with SOS.

I have the classic windows dump. 2 GB :-( and I don't know how to analyze it.

Do you make blocking calls on the threadpool in your apps code; rather than async calls? e.g. Read rather than ReadAsync or Write vs WriteAsync. Do you use Task.Wait,.Result,Thread.Sleep,GetAwaiter.GetResult() rather than Task.When, await, Task.Delay etc

Our web gets the data from WebAPI( only when not yet cached, the webapi is also asp.net core) using our owned implemented webapi client library. This webapi client library is using a single instance of httpclient to query the webapi using var response = HttpClient.SendAsync(request).Result; We don't want to make it all apis called asynchronous or at least we want to hide the asynchronous called internal. Not sure if this issue is related to that.

Drawaes commented 6 years ago

That's the problem. You are starving the threadpool. You can't do that you need to use async.

Drawaes commented 6 years ago

What are you trying to achieve by calling result? You can make your controller method async and return a task then just call

var response = await HttpClient.SendAsync(request);

This will wait for you but release the thread back to the thread pool so that other requests can be handled.

Also a single http client by default allows 2 connections to a single host. If all your methods are trying to hit this then you will likely hit that as well.

benaadams commented 6 years ago

Changing

var response = HttpClient.SendAsync(request).Result;

to

var response = await HttpClient.SendAsync(request);

Would fix it; but you'd need to propagate up the call chain.

Otherwise you will need to boost your min thread level

// Add at Main()
System.Net.ServicePointManager.DefaultConnectionLimit = 256; // Max concurrent outbound requests

System.Threading.ThreadPool.GetMaxThreads(out int _, out int completionThreads);
System.Threading.ThreadPool.SetMinThreads(256, completionThreads); // or higher

However, this is just moving up the number of concurrent requests you can handle before encountering the same issue again (or a different issue from too many running threads). As not being async has inherent scaling issues.

ElyseeOkry commented 6 years ago

@Drawaes, @benaadams

We are are not using directly HttpClient in our web application.

  var response = HttpClient.SendAsync(request).Result;

What you are asking is to change the execute to use await. This will mean all 100+ public methods must be async. Also we need to change all our applications to be able to consume the webapi client.

Question: Is there a technique to keep the execute only asynch without affecting other public methods?

Also a single http client by default allows 2 connections to a single host. If all your methods are trying to hit this then you will likely hit that as well.

Well, sincerely i didn't know this. based on multiple articles on the web, like this one https://aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong/, we change our code to use a single instance of httpClient.

I think the only less painful change is to try this

// Add at Main()
System.Net.ServicePointManager.DefaultConnectionLimit = 256; // Max concurrent outbound requests
System.Threading.ThreadPool.GetMaxThreads(out int _, out int completionThreads);
System.Threading.ThreadPool.SetMinThreads(256, completionThreads); // or higher
verysimplenick commented 6 years ago

Remember not use .Result, use .ConfigureAwait(false).GetAwaiter().GetResult(); if you need sync execution !

Drawaes commented 6 years ago

Yeah the http client 2 connections is a trap we probably have all fallen into at some stage so don't feel bad. Increasing the connection count will certainly ease the pain and is a simple change.

Ideally yes you make the wholesale change to async but as you said that is going to be a massive amount of rework.

Have you tried firing the method in a wrapping task if you can't change it

var result = await Task.Run(blockingmethod);

That's not exact as I am on my mobile. But give it a shot you don't have much to lose trying. With both of those you might ease your pain.

ElyseeOkry commented 6 years ago

@verysimplenick thanks I will give a try.

Drawaes commented 6 years ago

@verysimplenick it wont make a massive amount of difference I would suspect. The getawaiter/getresult will just unwrap exceptions.

The configureawait false won't make a huge diff as the context in asp.netcore isn't single threaded.

ElyseeOkry commented 6 years ago

@Drawaes thanks you

var result = await Task.Run(blockingmethod);

from the moment I want to use the keyword await, the method must have async. But what I am trying to do is to leave the execute method sync but the httpclient used must be asynch and it is not possible. It seems I can't do anything unless I go for massive changes :-( crazy

verysimplenick commented 6 years ago

@Drawaes , yes, I forget that asp.net core doesn't have SynchronizationContext

benaadams commented 6 years ago

For HttpClient you want to increase ServicePointManager.DefaultConnectionLimit (as per https://github.com/aspnet/KestrelHttpServer/issues/2104#issuecomment-337170742) otherwise it will only make 2 outbound connections to same host and others will queue; which will cause an extra backlog on the threadpool threads being blocked for longer.

Using a single instance of HttpClient will mean it will use connection keep-alive, which is no bad thing and will also help throughput.

Then if you can't or won't actually go async your only option is to increase System.Threading.ThreadPool.SetMinThreads; though it isn't a good solution as in increases context switching; each thread has a 1MB stack so 1000 threads is burning 1GB to do no extra work and doing the work more inefficiently; cpu caches will be used poorly etc

benaadams commented 6 years ago

As @Drawaes points out on Ubuntu the thread stack is 8MB so 1000 threads will be using 8GB

ElyseeOkry commented 6 years ago

@benaadams thanks guys I have increased the defaultconnection and SetMinThreads on thread pools. this is the quicker fix. Given the size of my webapi library client, changing all methods to async is going to take us a lot of time. and not preferable. I will do this as last solution if threads won't help. I will come back to you with the result.

samcic commented 6 years ago

Our production web app just experienced the same problem: in more or less our highest-traffic time of the week just a few hours ago, everything "froze up" and stopped responding. A test of all dependencies (storage accounts, Azure SQL Database) showed nothing abnormal. Pages that don't use any external dependencies were also not responding. CPU usage and memory usage was nothing abnormal. I wasn't able to find anything in the logs. Some requests would succeed after around a minute or so; others would receive a generic 5xx IIS error page. I wasn't able to find anything in the logs, however noticed later that we forgot to include UseAzureAppServices in Startup.cs (which may explain the lack of Kestrel logs?).

Just last Sunday we swapped in this .NET core app as a replacement for our "old" ASP.NET MVC 5 app. Up until today things had been running very smoothly and we were starting to get convinced that the migration had been a success.

To fix the problem, we quickly swapped back to the old ASP.NET MVC 5 app and the problems were fixed immediately.

Anyway, I'm very glad I found this thread because it seems related to our problem. Specifically, we're using this code a lot in our new .net core app to block on async calls:

        public TResult RunSync<TResult>(Func<Task<TResult>> func)
        {
            var cultureUi = CultureInfo.CurrentUICulture;
            var culture = CultureInfo.CurrentCulture;
            return MyTaskFactory.StartNew(() =>
            {
                Thread.CurrentThread.CurrentCulture = culture;
                Thread.CurrentThread.CurrentUICulture = cultureUi;
                return func();
            }).Unwrap().GetAwaiter().GetResult();
        }

(We "borrowed" this from the AsyncHelper class in Microsoft.AspNet.Identity, the "old" identity library for ASP.NET MVC 5)

Anyway, the comments here are making me think that our excessive use of the above code is causing our threadpool to be starved. We're now going to invest the effort to make everything async "all the way down", which we were planning on doing at some stage anyway. "Every cloud has a silver lining" :)...it's not a bad thing that this problem is forcing us to fix this up sooner rather than later in my opinion!

Like @ElyseeOkry , I'll let you know if this work resolves the problem.

Drawaes commented 6 years ago

Why block at all? You don't need to you just make your controller method return Task . The problem from @ElyseeOkry was changing 100 API calls if that isn't your case remove the blocking code.

samcic commented 6 years ago

@Drawaes Yep, completely agree we don't need to block at all. In our case these blocking calls were a migration from earlier code where the effort was never invested to update to async all the way. In our case it's many more than 100 API calls we have to change, but we've already started and are making good progress. One day of intense refactoring and it should be done! All I can say is thankyou ReSharper...

Drawaes commented 6 years ago

Feel free to head over to https://github.com/dotnet/corefx/issues/8931 And up thumb or comment. Basically if .Result is marked obsolete you can still use it but will get a warning which could be like "This could cause adverse scaling or deadlocking issues in an async environment"

samcic commented 6 years ago

I would like to try to reproduce this on Azure to confirm our assumptions that it's indeed the blocking behavior that's causing this issue. Do you think it be sufficient to do the following to reproduce this?

If you think this might be a good approach I'll test it out and see if the symptoms are the same as I experienced (and see if I can find some relevant log messages).

benaadams commented 6 years ago

Hit the first action with a large of requests (hundreds, thousands...)

Will be much lower than you think... ? (For the above example)

Should work

samcic commented 6 years ago

So I'm struggling a bit to conclusively reproduce this. Here's what I've done:

        private static readonly TaskFactory MyTaskFactory = new TaskFactory(CancellationToken.None, TaskCreationOptions.None, TaskContinuationOptions.None, TaskScheduler.Default);

        private void RunSync(Func<Task> func)
        {
            var cultureUi = CultureInfo.CurrentUICulture;
            var culture = CultureInfo.CurrentCulture;
            MyTaskFactory.StartNew(() =>
            {
                Thread.CurrentThread.CurrentCulture = culture;
                Thread.CurrentThread.CurrentUICulture = cultureUi;
                return func();
            }).Unwrap().GetAwaiter().GetResult();
        }

        [HttpGet]
        public ContentResult BlockRunSync()
        {
            RunSync(() => Task.Delay(TimeSpan.FromMinutes(5)));
            return Content("Done", "text/plain; charset=utf-8");
        }

        [HttpGet]
        public ContentResult DontBlock()
        {
            return Content("Done", "text/plain; charset=utf-8");
        }
            for (int i = 0; i < 200; i++)
            {
                ThreadPool.QueueUserWorkItem(o =>
                {
                    HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create("http://mysitehidden.azurewebsites.net/home/block?threadid=" + Thread.CurrentThread.ManagedThreadId);
                    myRequest.Timeout = 500000;
                    myRequest.GetResponse();
                });
            }

            while (true)
            {

            }

Here's the last portion of the Fiddler logs when I run the console application, and concurrently periodically refresh the "dontblock" action using Chrome.

Result Protocol Host URL Body Caching Content-Type Process Comments Custom
1399 - HTTP mysitehidden.azurewebsites.net /home/blockrunsync?threadid=121 -1 consoleapp5:17752
1400 - HTTP mysitehidden.azurewebsites.net /home/blockrunsync?threadid=122 -1 consoleapp5:17752
1401 - HTTP mysitehidden.azurewebsites.net /home/blockrunsync?threadid=123 -1 consoleapp5:17752
1402 - HTTP mysitehidden.azurewebsites.net /home/blockrunsync?threadid=124 -1 consoleapp5:17752
1403 - HTTP mysitehidden.azurewebsites.net /home/blockrunsync?threadid=125 -1 consoleapp5:17752
1404 - HTTP mysitehidden.azurewebsites.net /home/blockrunsync?threadid=126 -1 consoleapp5:17752
1407 200 HTTP mysitehidden.azurewebsites.net /home/dontblock 139 text/plain; charset=utf-8 chrome:12924 1408 200 HTTP mysitehidden.azurewebsites.net /favicon.ico 32,038 image/x-icon chrome:12924
1409 200 HTTP mysitehidden.azurewebsites.net /home/dontblock 139 text/plain; charset=utf-8 chrome:12924
1410 200 HTTP mysitehidden.azurewebsites.net /favicon.ico 32,038 image/x-icon chrome:12924
1411 200 HTTP mysitehidden.azurewebsites.net /home/dontblock 139 text/plain; charset=utf-8 chrome:12924
1412 200 HTTP mysitehidden.azurewebsites.net /favicon.ico 32,038 image/x-icon chrome:12924

So, even given over 100 requests, the "dontblock" action is still responding within a second or two each time.

My expectation, given everyone's comments, was that calls to "dontblock" would cease to work after perhaps 20 requests to "blockrunsync" had built up.

From the Kudu console I can see the thread count building up over time during this experiment:

process explorer

That is, the server seems to have no problem "scaling" in this situation by simply adding more threads.

In our production app at high load we have perhaps only 500-600 requests per minute, so I can't imagine that we'd be reaching 100+ threads that are blocked at any given time. Our average response time for requests is less than 100ms.

Might somebody have an idea why I can't reproduce our expected result here (i.e. server more or less stops responding to requests)? Did I perhaps miss something trivial in my approach? A large part of me wants to reproduce this before I'm convinced that changing our app to "async all the way" (which we're doing now regardless) will actually fix the problem.

ghost commented 6 years ago

this is my test result code: 5ood9m0f9 iwl5uk2vfucxs

asp.net core2.0 result

q80wvnz1q rmkt1ob1 b2h

asp.net mvc4 result

ap 0 rrqd9m py srh_ as

who can tell me why????

benaadams commented 6 years ago

Created a new Console application with Main containing:

The WebRequests are queued to threadpool and synchronous so will be throttled in the client in same way, with the threads climbing at same rate as server; need to make async requests. Also need to ReadAsync from the stream of the response as after GetResponse() in example the HttpWebRequest will be put on the finalizer queue and will chop the connection.

Also the threadpool will inject 2 threads per second so you have to keep the requests incoming.

only 500-600 requests per minute

Removing the blocking on the client and recreating the rate would look more like

public class Program
{
    private static string MyBlockingUrl = "http://mysitehidden.azurewebsites.net/home/block";
    private static System.Threading.Timer _timer;

    public static void Main(string[] args)
    {
        // Increase number of current outbound requests
        ServicePointManager.DefaultConnectionLimit = 1000;

        // Submit 500 requests per minute
        _timer = new System.Threading.Timer(state => CreateRequests(state), null, TimeSpan.Zero,
            TimeSpan.FromMinutes(1));

        Thread.Sleep(TimeSpan.FromMinutes(30));
    }

    private static void CreateRequests(object state)
    {
        for (int i = 0; i < 500; i++)
        {
            CreateRequest();
        }
    }

    private static void CreateRequest()
    {
        Task.Run(async () =>
        {
            HttpWebRequest myRequest = (HttpWebRequest) WebRequest.Create(
                MyBlockingUrl + "?threadid=" + Thread.CurrentThread.ManagedThreadId);
            myRequest.Timeout = 5000000;

            using (WebResponse response = await myRequest.GetResponseAsync())
            {
                using (Stream responseStream = response.GetResponseStream())
                {
                    await responseStream.CopyToAsync(System.IO.Stream.Null);
                }
            }
        });
    }
}

However there are still protections you are likely to hit. If you are on a client rather than server OS for the client, the OS may rate limit you for connections to single ip address. IIS in app service may rate limit you for connections from a single IP address. IIS will recycle and restart the dotnet process when memory gets too high or it stops responding etc.

benaadams commented 6 years ago

@taoli0124 Could you try

public async Task<IActionResult> Index()
{
    var text = String.Empty;
    HttpWebRequest request = WebRequest.CreateHttp("http://localhost:807/v5/dist/css/base.min.css");
    using (WebResponse response = await request.GetResponseAsync())
    {
        using (Stream myStream = response.GetResponseStream())
        {
            using (StreamReader sr = new StreamReader(myStream))
            {
                text = await sr.ReadToEndAsync();
            }
        }
    }

    return Content(text);
}
benaadams commented 6 years ago

@taoli0124 Also what happens second time you run ab (e.g. is it start up); are you using Kestrel 2.0.0; and the -k flag will enable keep alive in the ab test which will give you more representative results.

Drawaes commented 6 years ago

I would use wrk to generate the load. Tell it to make 2000 connections and give them a high timeout.. .

samcic commented 6 years ago

@benaadams Many thanks for your input and helpful response Ben. I've now been able to reproduce the issue thanks to your tips. I am indeed on Windows 10 Enterprise which did seem to throttle the outbound requests from my machine to Azure, however there were still "spurts" of 20 or so at a time, which for my BlockRunSync action (waiting only for 2 seconds on each call) was enough to choke the server. The server was non-responsive to my directly-returning DontBlock action during this time (I tested that using my mobile phone with mobile data, which is on a different IP to remove the chance that the non-responsiveness of this action was due to OS throttling).

Minutes later, Fiddler showed that each of the requests to the blocking action returned with a 502 error (bad gateway) from IIS/8.0 on Azure (presumably because IIS realized that Kestrel wasn't responding). Once all of these "drained" then the server started responding again (both on my OS and on my phone).

I still wasn't able to find anything particularly interesting in the log files (not overly familiar with how Kestrel logging works though).

Anyway, re-running the same test against the BlockAsync action showed the expected and correct behavior (i.e. nice and responsive with no errors or unexpected delays).

From my side the message is then relatively clear: Asp.Net Core apps struggle in "high load" when the actions are blocking, where "high load" seems to be "of the order of tens" of requests per second. This is enough to motivate us further to switch to "async all the way", and we'll continue our work on this with more confidence that it will resolve the downtime problem we had yesterday.

Thanks again for your help! If anybody thinks it would be useful to see the client and server code I used, happy to post it here for reproduction (just let me know).

ElyseeOkry commented 6 years ago

@samcic. Many thanks for this report. I think the conclusion( all async) is really a key info which asp.net team should propagate. People need to know this before migrating existing apps to asp.net core. The cost of making all libraries async is my case is too high and we could avoid this situation if we had this information in official documentation. Libraries are shared between apps and only asp.net core will require async versions... The other thing is that we don't have this issue in old asp.net. isn't anything the team can borrow from the old? Doing things differently?

@samcic, please post the test here. I believe people can modify it based on their need to run it on their server.

benaadams commented 6 years ago

@ElyseeOkry you can detect it in development by adding an event listener and getting it to output the StackTrace locations rather than waiting until it jams in production https://github.com/dotnet/corefx/issues/8931#issuecomment-337354565

samcic commented 6 years ago

Yep, agree it would have been nice to have been warned a bit more, perhaps somewhere in the migration docs from Asp.Net MVC 5 => Asp.Net Core, about this potential issue. There's also the issue above that @Drawaes referenced about making Task.Result obsolete, which we perhaps would have noticed too had it been there in our migration work.

Test here (containing two projects, one for the web app and the other for the client console test app). I ran on Windows 10 using VS2017 and used Fiddler.

azure-async-test.zip

ElyseeOkry commented 6 years ago

@benaadams Thank you. Now, yes I can and I will. But isn't better to include this kind of detector in asp.net core itself so people can have this enabled by default in debug mode? Your advise is in this thread. But how many people will read this useful advice in 2, 3, 6 months? When they will start having issues!. Too late...

ghost commented 6 years ago

@benaadams Thank you for solving my problem.

vinh84 commented 6 years ago

this issue maybe cause from Logging I used Serilog, Sink to splunk, hangs very quick, < 5 min or less (hight traffic) But when i remove, and only log to console, hangs sometime (maybe > 3 days or more)

When hangs, telnet port oke, but curl timeout. CPU 1-2%, memory normal.

Current, i use script for detech this problem, anh restart docker 👎

Drawaes commented 6 years ago

Are you using UDP or TCP to send to splunk?

vinh84 commented 6 years ago

(tcp)Http Current Serilog for netcore only sink http for splunk.

It is my case, i don't know why :(

On Sat, Oct 21, 2017, 10:34 PM Tim Seaward notifications@github.com wrote:

Are you using UDP or TCP to send to splunk?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aspnet/KestrelHttpServer/issues/2104#issuecomment-338410722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAu-BREqoZFPvSRRMg49BxZ8iBWQDacsks5sug8QgaJpZM4PyEfI .

--

...

Vinh Vo

Drawaes commented 6 years ago

Well we use serilog in high volume environments and it's fine. I suspect there is an issue with the splunk sink... I would double check the source code

samcic commented 6 years ago

Just in case anyone's interested, I wanted to share some comments regarding our rework and redeployment. We spent around three full days reworking our entire Asp.Net core 2.0 app to use async/await "all the way". This included

With the C# compiler and ReSharper working together, it felt like it would have been pretty difficult to get the rework wrong, so we were mostly confident that we weren't making highly-risky changes.

We're more than pleased with the result: all the main parts of our pipeline (middleware, filters, controllers) are now exploiting async/await. We're looking forward to this paying off in future in terms of resource usage and scalability.

With all tests passing, we redeployed on Saturday morning. We've now passed the busiest day of the week in Europe (Tuesday) and haven't seen anything like what we saw last Tuesday (server not responding). Of course that's not conclusive evidence that the problem is resolved, but it's a good start. If something does indeed go wrong in the coming days, I'll write an update here. Otherwise assume that the rework to async/await fixed this issue for us.

Long story short: for anybody migrating an existing app that does not heavily using async/await already to .net core, we'd advocate investing the effort to get async/await happening all the way based on our experience.

ElyseeOkry commented 6 years ago

@samcic many thanks for sharing this useful information.

I wish the team to put this in the official documentation !!!!

We haven't do the all async changes given the size of our applications. It will take weeks to get this completed.

We have this workaround which is working so far

we put connectionLimit and threadCount in the config and we play with that variables based on our server and max concurrent users.

The issue stops when we set to threadCount to 2000 and connectionLimit =300. memory used is around 600MB. and we have 6GB.

Conclusion:

benaadams commented 6 years ago

I've made a BlockingDetector package to output warnings to the logs when you block: https://github.com/benaadams/Ben.BlockingDetector

Only detects when blocking actually happens; so doesn't pick up calls that may block but don't or warn about coding practices that lead to blocking (or blocking that happens at OS rather than .NET level e.g. File.Read) - but it may help pick up things.

ElyseeOkry commented 6 years ago

@benaadams Thank you for this ;-). I appreciate it !

ElyseeOkry commented 6 years ago

sorry, I close it by mistake

aspnet-hello commented 6 years ago

We periodically close 'discussion' issues that have not been updated in a long period of time.

We apologize if this causes any inconvenience. We ask that if you are still encountering an issue, please log a new issue with updated information and we will investigate.