CPU usage issue - Githubissues

sawilde commented 4 years ago

This is an unusual issue and I can't determine what the cause is.

We are using Hangfire (ASP.NET hosted) to run a sequence of small jobs with a lot of multiple database interactions.

When run locally our machines run at 100% which is sort of expected for an intensive background set of processes.

When run in a container on Azure I rarely see the CPU usage go higher than 25% across all the instances (whether 2 or 4 instances, single/multi core) - I have tried many combinations.

We also tried a simple ASP.NET app with a background thread running a loop and the CPU hit 100% when run on Azure as expected.

Is there anything in Hangfire that could be limiting CPU usage to your knowledge - I have trawled through the github code but nothing stands out as potentially related.

odinserj commented 4 years ago

This behavior is likely to be related to latency. When you are testing locally, both application and storage reside on the same machine or closely enough and have a small round-trip time. Also usually local testing involves bare metal machines that work as fast as possible.

Cloud environments incur additional latency, because servers are usually in different networks, and each request takes longer to propagate from one machine to another. Even more, almost every physical thing in cloud is shared between multiple customers, and there are additional processing delays resulting in larger latency.

Another weakness of a cloud is the storage subsystem. Praise to the inventors we have SSD, but it’s still shared and much slower than on bare metal servers. And SQL Server works much worse in cloud environments, resulting in additional latency.

As with HyperThreading, you can mask additional latency by increasing the number of workers or try increasing the pricing plan for your storage, depending on the current bottleneck.

sawilde commented 4 years ago

Hmm I did wonder that but I was surprised to find it so "steady" - I am being stingy on the workers as I am trying to keep WIP under control and so I feed the sequence into the pipeline at a limited rate (based on the number of workers)

I'll experiment with the workers and let you now how I fare.

odinserj commented 4 years ago

If you are using SQL Server as a job storage, consider applying settings shown there to minimise the polling delays to their minimum – there's a chance workers are sleeping waiting for their turn in your current configuration.

HangfireIO / Hangfire

CPU usage issue #1661