NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
177 stars 95 forks source link

Inefficient way ncpa gets the relevant check data #911

Open ccztux opened 1 year ago

ccztux commented 1 year ago

Every single check causes ncpa to get all available data from the system which is inefficient.

For analyzing and to show the issue i have added some additional log messages.

I have executed this single check:

./check_ncpa.py -H localhost -t mytoken -M cpu/percent -q 'aggregate=avg'

Which results in (log ouptut):

2023-02-13 10:38:23,985 20788 INFO started ncpa_listener, version: 2.4.1
2023-02-13 10:38:23,986 20788 INFO Using SSL version TLSv1_2
2023-02-13 10:38:45,052 20788 DEBUG Initializing WebSocket
2023-02-13 10:38:45,053 20788 DEBUG Validating WebSocket request
2023-02-13 10:38:45,055 20788 DEBUG [ccztux's debug message] Function: 'get_root_node' was called
2023-02-13 10:38:45,056 20788 DEBUG [ccztux's debug message] Function: 'get_cpu_node' was called
2023-02-13 10:38:45,056 20788 DEBUG [ccztux's debug message] Function: 'get_memory_node' was called
2023-02-13 10:38:45,057 20788 DEBUG [ccztux's debug message] Function: 'get_disk_node' was called
2023-02-13 10:38:45,067 20788 DEBUG [ccztux's debug message] Function: 'get_interface_node' was called
2023-02-13 10:38:45,069 20788 DEBUG [ccztux's debug message] Function: 'get_plugins_node' was called
2023-02-13 10:38:45,069 20788 DEBUG [ccztux's debug message] Function: 'get_user_node' was called
2023-02-13 10:38:45,069 20788 DEBUG [ccztux's debug message] Function: 'get_system_node' was called
2023-02-13 10:38:45,069 20788 DEBUG [ccztux's debug message] Function: 'services.get_node' was called
2023-02-13 10:38:45,069 20788 DEBUG [ccztux's debug message] Function: 'process.get_node' was called
2023-02-13 10:38:45,606 20788 INFO ::1 - - [2023-02-13 10:38:45] "GET /api/cpu/percent/?token=mytoken&check=1&aggregate=avg HTTP/1.1" 200 305 0.554288

This also happens in the GUI on the following endpoints:

For https://localhost:5693/gui/top only the relevant data will be collected by ncpa which is fine: https://github.com/NagiosEnterprises/ncpa/blob/9ef1e7e9fce6d640dcfc94da116b7b912403910f/agent/listener/server.py#L833-L846

btrnka63 commented 2 days ago

Hi, let me share findings: In "server.py" there is node = psapi.getter(... call within the "api()" funtion which triggers the corresponding function from psapi.py. That function runs

if not cache:
    refresh(config)

which initializes the global object "root":

def refresh(config, path):
    global root
    root = get_root_node(config)
    return True

but the "get_root_node()" functions runs all the cpu, mem, disk, etc. node generators no matter which API path was actually called each time any /api/* request was triggered.

What about passing the "path" trough "refresh()" to "get_root_node()" and process only that part?

ccztux commented 1 day ago

I believe this issue deserves a higher priority due to its impacts.