Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.33k stars 1.06k forks source link

System notification for unused CPU cores #4767

Open lennartkoopmann opened 6 years ago

lennartkoopmann commented 6 years ago

I just finished a support call with a customer who had seemingly idle systems that were not able to process all messages being ingested and filling up the journal.

It was a 32 core machine (64 threads) running graylog-server, but the *_processor settings in graylog-server.conf were on the default setttings. The result was that only 4 of the available CPU threads were working on messages, with the remaining 60 threads being idle.

We should not expect customers to know this but we also should not just attempt to set the *_processor settings automatically because there might be other systems running on the same box and we'd be stealing resources from them, probably making the situation even worse.

This is why I suggest adding a new system notification that is triggered when less than $cpu_threads-x (x to be determined. this is the reserve for OS tasks, the garbage collector etc) are used by Graylog. The notification should link to a documentation page explaining the settings and why it is required. There should also be a way to disable the notification (in graylog-server.conf?) for setups where this condition is OK and the constant notification would be annoying.

dennisoelkers commented 6 years ago

I think this gets way too complicated for what we want to do. There are pros and cons to just using all available cores:

Pros:

Cons:

In my life before Graylog doing operations I have never encountered a situation where using too many cores for a process by default broke things, but using too little did in a lot of cases.

kroepke commented 6 years ago

I don't like adding more notifications to the product.

If it is that important it should interface with a monitoring system and not display a bubble in a UI that is potentially not seen. If it is not important, I'd prefer not to be distracted by it constantly.

Perhaps our defaults are a bit conservative, but it's also not trivial to set them to "all cores", because there is more than one thread count that is important to tune.

deoren commented 6 years ago

+1 for a notification. I'm relatively new to Graylog and would benefit from one-time notifications regarding poor system configuration choices.

kroepke commented 6 years ago

I would prefer some tuning help/overview page instead of notifications. It's hard to decide when to notify, do we notify for every server in the cluster, what happens when new servers join, do we need to notify again at some point, even if it was dismissed?

I think it's better to pick some sensible defaults and auto tune based on cpu count during start. Then display those choices again on a tuning page.

jalogisch commented 6 years ago

I'm with the idea having details for example on the Nodes detail page with the settings and suggested changes but no active notification.

deoren commented 6 years ago

@jalogisch: I'm with the idea having details for example on the Nodes detail page with the settings and suggested changes but no active notification.

I'd definitely find this useful.