Open dnsmichi opened 5 years ago
var l = 0; var c = 0; for (s in get_objects(Service)) { if (s.last_check_result) { var ex_delta = s.last_check_result.execution_end - s.last_check_result.execution_start; var s_delta = s.last_check_result.schedule_end - s.last_check_result.schedule_start; var latency = s_delta - ex_delta; l += latency; c += 1 } }; var avg_latency = l / c; avg_latency
Example for 1000 services, check_interval = 10s calling a perl plugin which sleeps for 10s.
<112> => var res = {}; var l = 0; var c = 0; for (s in get_objects(Service)) { if (s.last_check_result) { var ex_delta = s.last_check_result.execution_end - s.last_check_result.execution_start; var s_delta = s.last_check_result.schedule_end - s.last_check_result.schedule_start; var latency = s_delta - ex_delta; l += latency; c += 1 } }; var avg_latency = l / c; avg_latency
0.012186
<113> => var res = {}; var l = 0; var c = 0; for (s in get_objects(Service)) { if (s.last_check_result) { var ex_delta = s.last_check_result.execution_end - s.last_check_result.execution_start; var s_delta = s.last_check_result.schedule_end - s.last_check_result.schedule_start; var latency = s_delta - ex_delta; l += latency; c += 1 } }; var avg_latency = l / c; avg_latency
0.012526
<114> => var res = {}; var l = 0; var c = 0; for (s in get_objects(Service)) { if (s.last_check_result) { var ex_delta = s.last_check_result.execution_end - s.last_check_result.execution_start; var s_delta = s.last_check_result.schedule_end - s.last_check_result.schedule_start; var latency = s_delta - ex_delta; l += latency; c += 1 } }; var avg_latency = l / c; avg_latency
0.013776
<115> => var res = {}; var l = 0; var c = 0; for (s in get_objects(Service)) { if (s.last_check_result) { var ex_delta = s.last_check_result.execution_end - s.last_check_result.execution_start; var s_delta = s.last_check_result.schedule_end - s.last_check_result.schedule_start; var latency = s_delta - ex_delta; l += latency; c += 1 } }; var avg_latency = l / c; avg_latency
0.013695
<116> => var res = {}; var l = 0; var c = 0; for (s in get_objects(Service)) { if (s.last_check_result) { var ex_delta = s.last_check_result.execution_end - s.last_check_result.execution_start; var s_delta = s.last_check_result.schedule_end - s.last_check_result.schedule_start; var latency = s_delta - ex_delta; l += latency; c += 1 } }; var avg_latency = l / c; avg_latency
0.013738
get_objects(Service).filter(s => s.last_check < DateTime().value - 2 * s.check_interval).map(s => s.__name).len()
@Al2Klimov Can you please add the findings from your Boost fiber analysis?
From a technical view point, the structure how checks are executed and running is just fine. With the analysis above, specific things have become more clear:
Whenever a check is executed, a task is inserted into the global tread pool. If the pool hits the maximum number of threads (by concurrency = cpu core count), the remaining tasks are in a waiting list. 2.11 adds the pending callbacks as additional metric to inspect whether the TP is overloaded.
Whenever a check has finished, the result is put as asynchronous task into the global thread pool. If the TP is overloaded, the CR task has to wait.
If one or two of the above are true, check latency will increase. schedule_start
is set when the check scheduler effectively inserts the task into the TP, schedule_end
is set upon processing the check result.
If you run 1000 check_nwc_health in parallel, with a memory footprint of 3.3 mb for each Perl plugin, this takes a while to start, compile, execute and finish. Since this also includes SNMP tree iterations, such checks may be slower than the average ping.
If ping checks always run into a timeout, the default of 10 or 30 second applies. During this time the process is "stalled" in idle wait for ping to return. If you have 1000 tasks in the TP, and 1000 are pings which run into a timeout, you have the same problem as with a HTTP API or JSON-RPC server - it just won't accept new connections, or, in this case, it won't execute any more tasks.
Given that you may execute 200k services in a 5 to 10 minute interval, that's 20 to 40k per minute worst case. Let's stick with an average of 30k per minute, which renders this to
30000 / 60 = 500 per second
Granted that the max threads in the pool follow the CPU cores 2, resulting in e.g. 96 2 possible threads working on tasks, the general throughput depends on the check execution tasks to be really fast (< 1s). With the aforementioned timeouts, this is impossible to reach.
If you have many CPU cores, say 32, this allows for more threads in the thread pool. If those CPU cores are low end with say 800mhz, you won't win price compared to 4 cores with 3ghz. That gets even worse if context switches between threads become slower.
If you would run 30k service checks per minute, this creates at least one cluster message for each check result being returned from the satellite to the master (and also the secondary satellite). The memory footprint of the CR mainly depends on the plugin output and performance data, and may range from a few bytes to many KBs too.
If you're not on a low profile bandwidth, this shouldn't harm the network layers to just send over the TCP packages resulting in JSON-RPC messages. Still, the messages are wrapped into JSON which needs to be encoded/decoded on each end. Also, when receiving such a message, it needs to be processed in order.
That being said, next to the checker's thread pool being overloaded, the cluster may be overloaded as well. You can measure this by extracting the metric from /v1/status/ApiListener inside the json_rpc struct. This also is returned with the icinga
built-in check. If that's on a steep curve and doesn't lower itself over time, you really have a performance issue.
You can mitigate this even further with analysing the system's performance on context switching, IO and thread consuming for the process - e.g. with the added sar_perf CheckCommand in #7428.
Actually, no. Stop blaming the scheduler for everything. It isn't an AI nor does it provide machine learning capabilities. Though it has some intelligent scheduling inside to prevent problematic situations like:
Users tend to claim that check results are old after firing a reload, but this is just a spread check execution and won't be visible again after running for some time.
With solving the plugin execution time and making things fast and efficient, the core's scheduler will perform as intended. This also is documented, and if you have a better implementation, feel free to propose this in a detailed concept.
https://icinga.com/docs/icinga2/snapshot/doc/19-technical-concepts/#check-scheduler
Ideas like learning from check execution time and putting them into a low latency group or defined check group are just fine, but need to prove themselves not to cause any other harm.
Also, these ideas are not for a standalone core as we had with 1.x but are related to HA enabled zones and in addition, multiple levels of a distributed environment.
The actually running checks differ from what is really possible. The MaxConcurrentChecks
setting also takes the workload of the thread pool into account.
It doesn't make sense to put something into the full TP, rather checks are delayed. This also increases the check latency. Therefore this setting may not be reached at all if the underlaying performance suffers.
Users should do the following:
Developers will do the following for future releases after 2.11:
We've agreed on removing this task from 2.11, since the performance compared to older versions does not suffer. One performance problem in an aforementioned environment was caused with the downtime loop which we have mitigated and fixed already.
Therefore I am putting this into 2.12, as ongoing task.
Note
There is absolutely no guarantee that this specific issue gets resolved in this version. We will "fork" specific work tasks from this tracking issue, and carefully decide which problems to tackle and which feature to implement. The issue's state might not always be current too.
mkdir /etc/systemd/system/icinga2.service.d
cat >/etc/systemd/system/icinga2.service.d/limits.conf <<EOF
[Service]
LimitNOFILE=50000
LimitNPROC=50000
TasksMax=infinity
EOF
Debian:
vim /etc/default/icinga2
RHEL
vim /etc/sysconfig/icinga2
Content:
ICINGA2_RLIMIT_FILES=50000
systemctl daemon-reload
systemctl restart icinga2
for p in $(pidof icinga2); do echo -e "$p\n" && ps -ef | grep $p && echo && cat /proc/$p/limits | grep 'open files' && echo; done
for p in $(pidof icinga2); do echo -e "$p\n" && ps -ef | grep $p && echo && lsof -p $p && echo; done
Story
TL;DR - performance analysis and debugging on the throughput of checks. Scroll down for problems, technical analysis and solutions.
Issues
ref/NC/628417 ref/NC/625049
Problems
Performance
The multi-plexed unix socket pair with pipe streams and event FD polling introduces several locks with concurrent check runs waiting for each other. Also the throughput with the thread pool is blocked with this.
7421 exposes this with current_pending_callbacks as metric.
7416 discusses a problem where the actual running checks are different to what's actually executed as processes. Whenever the thread pool is locked up, no more additional checks can be executed. This is different to the actual running processes though.
In theory, the throughput of executed processes must be faster than 1000 per second for instance.
Too many open files
Whenever MaxConcurrentChecks is raised to a higher limit than the system's open file descriptors, the process pipes may exceed this limit.
The following fork() calls then resulted in errors with too many open files preventing the fork thus returning into a critical check result.
This can be solved with raising the number of open files within Systemd and Icinga 2.
Late Check Results
Technical Details
Check Execution Flow
Check Scheduler
The checker component works as check scheduler, each checkable object is stored with its next execution time.
Once a check should be executed, there's additional guards in place to ensure that not all 10000 checks run in the same second for instance (MaxConcurrentChecks). At this time, the scheduler waits up until the remaining running processes finish and free up slots.
The check task is posted into the application's thread pool, and the scheduler continues with other checkables.
lib/checker/checkercomponent.cpp
Global Task Thread Pool
Once the thread pool works on the task, the registered callback function
CheckerComponent::ExecuteCheckHelper
is executed.lib/base/threadpool.hpp
The callback function actually runs the check and instruments the scheduler upon finishing later.
lib/checker/checkercomponent.cpp
Asynchronous Check Execution
ExecuteCheck
prepares the check result and also decides whether to run the check locally, or via command endpoint as cluster message. The latter involves resolving the macros with a simulated check run first. This is not scope of this workflow though.A local check is fired based on the
check_command
attribute and its registered execution method.lib/icinga/checkable-check.cpp
Generic Task Execution Functions
The following requires knowledge about the default templates and also execute methods for commands. Older versions of Icinga required to import the
plugin-check-command
template which has hidden the following from the user:Newer versions of Icinga have that as default and for example a CheckCommand already knows that it will execute a PluginCheck task.
There are a few other tasks registered, e.g. for built-in memory checks like
icinga
,cluster-zone
, etc. These built-in check command register different tasks than PluginCheck. This can be seen inlib/methods/methods-itl.conf
which is compiled into the binary.The CheckCommand Execute() method gets the
execute
attribute and calls the registered Invoke() interface. This allows to have multiple execution types with a single implementation (OOP).lib/icinga/checkcommand.cpp
PluginCheck Task
The PluginCheck task has a script function registered. Each method is put into the
lib/methods
library which is highly extendable (for developers) with the simple interface used here.lib//methods/pluginchecktask.cpp
The macro takes care of fetching the ScriptFunction from the global context, being
PluginCheck
in this case. The registered callback functionPluginCheckTask::ScriptFunc
is then invoked on call.lib/methods/pluginchecktask.cpp
A few preparations are taking place before check execution:
Then again, it fires a generic implementation for executing a command.
lib/methods/pluginchecktask.cpp
Note the registered callback function
ProcessFinishedHandler
which later takes care on creating the final check result with output, status from exit code, etc.Prepare and Execute the Command
The function interface has quite a few things to pass which are needed for check execution.
lib/icinga/pluginutility.cpp
Now the following happens:
command
arraycommand
arguments, if anyIf that fails, the error is captured as output and returned as UNKNOWN check result to the caller.
env
.Then the final process is being run. This re-binds the callback function to parse the returned output and exit state later on.
Enter the Process rabbit hole
The previous steps are all required to support passing check command parameters via custom variables from the DSL. The macro resolver also takes care about running the lambda functions at runtime for instance.
Now that the command line is in full flux, we continue. The following parts reflect the implementation in Icinga 2.10.
lib/base/process.cpp
At this stage, Linux and Windows differ a bit. The following description focuses on Linux/Unix where we support running the check scheduler in high performance environments, Windows are just for low energy agent checks.
lib/base/process.cpp
lib/base/process.cpp
The returned process is stored for later asynchronous event processing with reading the output from the plugin process. This is based on a limited number of IO threads, 8 in 2.10.
Process Spawn Helper
This is the secondary process visible with
pidof
. It listens on a Unix socket pair for event messages from the main process.It gets created via
StartSpawnProcessHelper
as a forked process, and starts listening in theProcessHelper()
function as endless loop.Decodes the received JSON message and fires specific processes:
spawn
executes a command,waitpid
waits until the command process is finished, andkill
ensures that plugin timeouts are correctly dealt with.The response is JSON-encoded and sent back via unix socket pair.
Process Spawn
This creates a
spawn
command which instruments the process spawn helper to create a new process for this plugin execution.l_ProcessControlFD
links to the Unix socket pair descriptor which is used for communicating with the process spawn helper. This sequence also holds the pipe for stdout/stderr for the plugin's process.Actually spawn a process and execute something
This is done by the helper process in
ProcessSpawnImpl()
.Then the child process is forked.
Its priority is niced, if existing.
The actual execution happens in the built-in
icinga2_execvpe
which uses an optimized vfork() implementation trimmed for best performance.The output is written directly on the FD which is mapped as pipe to the main process. You can see that with
lsof -p <mainprocessPID>
.The errno and pid is returned.
The Async Process Event Loop
Again in the main process in
void Process::IOThreadProc(int tid)
- this gets spawned during the startup of Run(), just once.There are several registered IO threads, namely 4. All of them listen on the registered poll FDs from the Unix socket pair.
Whenever
poll
signals a new event ...... the current IO thread reads.
And once it has read enough, it calls
DoEvents
.DoEvents aka do something
Note: This is the pipe created in the beginning and wrapped into the process spawn helper.
Process the check result
Back to
lib/methods/pluginchecktask.cpp
-This reads everything from the process result and creates a new check result being processed. Since this ran in the thread pool asynchronously, it may occur that it took quite a while on heavily stressed systems.
At this point the whole check execution thingy is done. Magic, isn't it?
Changes
MaxConcurrentChecks
Limits the currently running processes, defaults to 512.
Sandboxed processes
2.6 introduced a process spawn helper which executes the checks in an isolated environment. This was to ensure that no file descriptors from the main process leak into the plugin's process.
This can be
This was implemented for solving #2847.
Solutions