HangfireIO / Hangfire

An easy way to perform background job processing in .NET and .NET Core applications. No Windows Service or separate process required
https://www.hangfire.io
Other
9.31k stars 1.69k forks source link

StackExchange.Redis 2.x #1266

Open brightertools opened 5 years ago

brightertools commented 5 years ago

I have a .net core 2.1 app that uses HangfirePro with redis, SignalR with redis and session with redis..

The app is currently referencing the "StrongName" version 1.2.6 , and I seems to have many timeouts, which is is causing havoc with my sessions. I want to try StackExchange.Redis 2.x and wondering when this will be available within HangFire Pro?

markalanevans commented 5 years ago

@odinserj we have the same issue.

Hosting our app on Azure and using a redis instance, we see lots of timeouts. What version are you bundling into Hangfire.Pro.Redis ?

brightertools commented 5 years ago

We were using basic level of Redis Server, and saw a lot of issues disappear when we scaled up to the next level. We are using the same redis server for session state and signalR.

marcoCasamento commented 5 years ago

I'd suggest to increase the syncTimout in the connection string. Not sure about how to do it in Hangfire.Redis, but it should be something like

localhost:6379,syncTimeout=5000

in the connection string. Especially when Jobs are in large batches, Hangfire can send large amount of data in a unique transaction, and that can exceed the StackExchange.Redis default syncTimeout of 1 sec.

I've also tried StackExchange.Redis 2.0 in another storage to see if it solves the problem to avoid the syncTimeout increase, but NO, the timeout issue always arise when dealing with large batches or with basic level on azure, no matter of the StackExchange.Redis version I use.

markalanevans commented 5 years ago

@brightertools what did you scale up to?

brightertools commented 5 years ago

@markalanevans "C1 Standard"

markalanevans commented 5 years ago

@marcoCasamento so for your normal redis was that your syncTimeout value?

markalanevans commented 5 years ago

@brightertools did you configure a longer syncTimeout in your redis connection string also?

brightertools commented 5 years ago

@markalanevans : I have syncTimeout=3000

pieceofsummer commented 5 years ago

It would be impossible to avoid syncTimeout until the storage architecture is upgraded to be fully async. Which isn’t happening anytime soon.

Anyway, setting a big syncTimeout should resolve issues even when under heavy load. I personally use a 15000 for my configuration, which works perfectly.

marcoCasamento commented 5 years ago

@markalanevans I use 5000 on on-premise installation and 15000 on Redis basic level on azure.

pieceofsummer commented 5 years ago

Also, if you're running on Linux, it is recommended to disable Transparent Huge Pages for better database performance.

odinserj commented 5 years ago

The majority of the timeout problems related to .NET Core application was caused by inner implementation of the networking layer of SE.Redis in the netstandard1.5 target, which was the only available for .NET Core applications in version 1.2.6 and below.

Other platforms worked well, because in .NET Framework, custom threads were used to process responses, in netstandard2.0 (unavailable in official SE.Redis 1.2.6 and below) it's possible to process them in I/O Completion Port threads. But in netstandard1.5 we are limited only to worker threads to process response continuations. So if there are no available threads to process a response, request threads will still be blocked.

Yes, this issue was fixed in SE.Redis 2.0, because request/response processing logic was re-written almost entirely. But the new implementation adds another sort of blocking, this time when sending a request, which also results in worse throughput. So I've decided to take 1.2.6 as is and just add the netstandard2.0 target and internalize it into the HF.Pro.Redis package (so there's also no need for HF.Pro.Redis.StrongName package).

These changes are available in the Hangfire.Pro.Redis 2.3.0 released yesterday.

odinserj commented 5 years ago

By the way, with 2.3.X you aren’t locked to the SE.Redis 1.X in your project, since the package don’t have a dependency now.

Also today I’ve realized there are other issues with 1.2.6 running on .NET Core on Linux. The platform doesn’t provide a way to bind asynchronous IO to completion port threads, and therefore the blocking issues weren’t resolved in full there when using synchronous API.

I’ll take another look next week on this issue.

P.S. Please don’t put additional load on Marc Gravell, I believe it’s not good to ask him for any help unless Redis storage support is completely open source 😁

markalanevans commented 5 years ago

@odinserj ok good to know.

We are upgrading to SE.Redis.2.0 now.

I Just wanted to understand these issues and at the bottom of https://stackexchange.github.io/StackExchange.Redis/
Mark Gravell said contact him there if we had questions ;)

@odinserj Sorry to disrupt. 🙇

I just want these timeout issues gone. It's often the number 1 error in our logs.

So we have upgraded our caching services to use 2.0 but HF is still on 1.2.6 so i'm just really hoping the timeouts could go away w/ the 2.0 update.

odinserj commented 5 years ago

Today I've performed load testing using SE.Redis 2.0 version and got the same timeouts as in 1.2.X. The problem is there's no way in .NET Core on Linux to queue the completion of an asynchronous socket operation to a completion port thread as in Windows-based implementation. It is queued to a regular worker thread, where web requests are processed.

In SE.Redis 2.0 response processing logic is performed in dedicated pool of threads, but to queue a continuation to those threads, we still need to handle async operation's completion from socket on a worker thread first. So when threads are busy, and new work (unrelated to Redis) arrive too quickly, comparing to new thread injection thresholds, we'll get a timeout.

screenshot_13

Asynchronous, task-based Storage API will help to get rid of the primary reason of these timeouts, because worker threads at least will not be blocked on synchronously waiting a result. But if thread pool backlog is too long and/or work items take non-trivial amount of time to be processed, then we'll get timeouts even when using asynchronous methods. The only way to avoid this is to have completion port threads and the ability to post async I/O continuations on them back.

Meanwhile the only way to avoid those timeouts is to use TaskFactory.StartNew method and pass LongRunning option, which will always schedule the given action to be executed on a newly created custom thread.

await Task.Factory.StartNew(() => BackgroundJob.Enqueue(/* ... */), TaskCreationOptions.LongRunning);

The method above is a clean and simple to use method, but if the profiler shows you that your application spends too much time on creating new threads, consider to use custom task scheduler instead (this, for example).

mgravell commented 5 years ago

FYI; you would be right to say that SE.Redis in the 2.* branch has had some teething problems and stalling issues. We have recently fixed a major perf problem in 2.0.588 - it may be worth re-evaluating.

odinserj commented 5 years ago

@mgravell thanks a lot for your work and for your time :heart: But don't drink too much coffee (oh, you will not anyway due to restrictions) :wink:

mgravell commented 5 years ago

(sees email) - thanks hugely. I wonder what the LD50 is on coffee ... is this a murder attempt? :)

odinserj commented 5 years ago

Coffee shop (tm) restricts me in attempt to do so, so I've failed anyway 😄