Closed FixRM closed 10 months ago
Hi, what version of Hangfire.Core you are using, and what is your storage package and its version?
Thank you for answer, @odinserj. Hangfire.Core 1.8.2
, storage Hangfire.Redis.StackExchange 1.8.6
.
Thanks, please try upgrade to the latest versions and post an issue in Hangfire.Redis.StackExchange's repository. It's highly likely that the problem occurs there.
Ok, sure. But why do you think it is a storage problem? I mean that storage is abstracted from job invocation, isn't it?
It's from the past experience. Another way of troubleshooting there is to run stdump tool to obtain stack traces of all the threads when there are issues and post it here, it should contain useful information that will point us to the right direction.
Ok, thanks. I already have stdump logs collected at the right time, but I don't understand what should I look for there to find a problem? I can see, threads that I expect to be there, but question is why other does not exist
Just post them here as they contain only methods with no arguments, and I will take a look.
Ok, I put is here but I'd love to learn how to read this logs from you.
Thanks, they look great. Just to ensure - were they captured during the "processing one job at a time" problem? Because after a brief look I see a lot of workers processing jobs, but a lot of they are waiting for something. I will check the details tomorrow and get back to you.
Not exactly ( I guess there was about 5-10 jobs per farm. Unfortunately we forgot to make shot. This time we have to react and reboot quickly as business critical task was delayed.
Thank you very much @odinserj. It was our issue with DI implementation.
We have WCF service client that is injected to our jobs. Client is opening WCF connection in it's contractor and got stuck during metadata discovery due to long timeout and network related issues. Technically, Hangfire worker was busy with creating job instance but instance was not running and thus not visible in Dashboard.
Sorry if it’s know issue, or there is a workaround. I didn’t manage to find something similar.
We have 3 servers in tenant running a batch of recurring jobs. Each server is configured to use 20 worker threads (all servers has 4 virtual cores). After a while we see that is only one job is running at each server at the time.
If we reboot application, HF immediately start to run 20 jobs at the time. What can cause this behavior?