Sudden burst of messages within interval

Rfferrao87 commented 6 years ago

Hello, good afternoon.

I'd like to know if there's any way I can find out what's going on with NDOUtils regarding this issue. Basically, it's like the queue throughput is doing fine for around 30 to 40 seconds, but, after that time range, the IPC message queue just spikes to around 10000 messages out of nowhere and, again, the cycle begins anew.

Is this expected behaviour? Why does this happen?

Our Nagios server is comprised of about 40000 checks and we're using the latest NDOUtils release.

Thank you.

box293 commented 6 years ago

You're hitting the limitations of NDO currently as it is single threaded. I understand that the next release of NDO will include improvements.

Rfferrao87 commented 6 years ago

You're hitting the limitations of NDO currently as it is single threaded. I understand that the next release of NDO will include improvements.

Just in case, is there sight for a release date on this new NDO release? Thank you for the feedback!

box293 commented 6 years ago

I am unsure of this sorry but I do know it is high on the priority list.

hedenface commented 5 years ago

@Rfferrao87 Did you ever come up with a good solution for your problem? I used to manage a large Nagios installation as well, one thing that helped me was offloading ndo2db to its own dedicated server that was tuned properly (and using the tcp connection between virtual machines).

Rfferrao87 commented 5 years ago

@hedenface To be frank, even though it's intermitent, I've been quite stumped on the issue still. In order to mitigate this temporarily, me and my team have been doing maintenance on the database side, not allowing certain tables to grow too large or have optimization too often. Also, by your suggestion you mean we could try integrating ndo2db into the mariadb dedicated server?

sawolf commented 5 years ago

@hedenface just stepped out of the office, so to give you a response today:

ndo2db itself will remain on your nagios server. What Bryan's referring to is this documentation, which he didn't link because it's specific to Nagios XI.

If you're running XI and you've already taken these steps (and enabled jumbo frames/ensured a fast network route between the servers), then you've already exploited that option. However, if you're running Core and/or haven't seen that documentation, you may want to take a look through it and try to implement the parts specific to ndo (which should just be the changes to ndo2db.cfg, but I could be mistaken).

Rfferrao87 commented 5 years ago

@Madlohe Is this a new feature? I've followed the documentation for a while, but this is the first time it came to my attention. So, we are yet to test if this is applicable in our environment, but is there a procedure made by you guys in order to actually implement Jumbo Frames on Nagios XI?

sawolf commented 5 years ago

If I'm remembering correctly, the documentation I linked has been here since I started (~2 years ago). If you want assistance with performance tuning, I'd recommend starting a forum thread (here). I don't remember all of the different things we recommend, but our support technicians do tend to stay on top of that. At a minimum, though, I'd make sure that you're using a RAM disk for passive checks and performance data, and mod_gearman to distribute checks (in addition to the offloaded database). A single Nagios XI server with several mod_gearman workers should be able to handle 50-80k checks, depending on the distribution of active/passive checks, the size of the check_period, etc.

I think implementing jumbo frames is going to be dependent on your network hardware, which is why we don't have any specific articles/docs on that.

hedenface commented 5 years ago

@Rfferrao87 @Madlohe I was actually referring to moving ndo2db to the mariadb server, yes. In order for this to work, you simply change from unix socket to tcp in ndomod.cfg and ndo2db.cfg - jumbo frames should still apply (and be very helpful) here. Also ensure that all of the kernel tuning present on the nagios side are present on the offloaded side.

Rfferrao87 commented 5 years ago

@hedenface @Madlohe Thank you for all the feedback! I'm planning to simulate these settings/topology on my lab environment asap! Will keep you posted about the results after that's done.

hedenface commented 5 years ago

@Rfferrao87 I'm going to close this issue, as the newest development branch of ndo-3 no longer faces this issue. I can't speak to a release date, but I would love some help testing. Please let me know if you're interested so we can get you set up with a pre-pre-pre-alpha release binary :)

NagiosEnterprises / ndoutils

Sudden burst of messages within interval #51