Open Obihoernchen opened 5 years ago
Note that the network stack (#7005) has nothing to do with scalability like requests/s. It's about not denying the service in general in a large environment.
@Obihoernchen Have you tried to re-use HTTP/1.1 connections? This would increase requests/s as due to not done TLS handshakes.
I think that's the one feature we've talked about at Icinga Camp Berlin, but the description is a bit irritating. It is not about API calls, but cluster messages fired for command endpoint checks.
The logic should be sort of
I'm not sure how to exactly achieve this, but it sounds like a good idea to discuss.
Cheers, Michael
Not sure how much speed this will add (especially after #7005)... but the proof of the pudding is in the eating.
Has anyone ever done some profiling on this? I think, the overhead of JSON-RPC itself shouldn't be too high to make a significant difference so that combining messages makes a significant difference (if it does, maybe that should be optimized).
I think it's more likely that the actual action behind the JSON-RPC message is expensive, and all actions would still have to be split into the individual checkables, so I have little hope that this change would make a huge improvement.
But all of this is my gut feeling, so prove me wrong if you like :)
Often a lot of services for a given endpoint have the same check interval. For instance there is a host with 20 services attached to it. All services have the same check interval of 1 minute. Of course there might be other services with different check intervals but often check intervals are the same at least for some services.
Current Behavior
Currently Icinga2 will send 20 cluster messages to the remote endpoint every minute to get the results. Having a lot of hosts with a lot of services attached to a satellite node slows down the API significantly. ~This is also related to the current API issues which should be improved/fixed with the new 2.11 network stack, but this idea might be interesting anyways.~ see comments below.
Expected Behavior + Possible Solution
In my opinion it would be a nice feature if Icinga 2 tries to combine these 20 cluster messages to a single "batch" cluster message. So all services of a host with the same check interval could be combined to a single cluster message. Or if you want to go one step further you could even try to combine services with 1 minute, 2 minute (every 2nd API call) check intervals etc. I think such a feature lowers the number of cluster messages significantly and improves scalability. Edit better description from @dnsmichi:
Possible Issues
This probably needs to be an opt-in feature because it affects timeouts and check duration.
Your Environment
@dnsmichi We talked about this at Icinga Camp Berlin 2019 (Markus) ;-)