Atoptool / atop

System and process monitor for Linux
GNU General Public License v2.0
789 stars 110 forks source link

atop may hang on start #232

Closed rvarenik closed 1 year ago

rvarenik commented 1 year ago

I've come up with a machine having /proc/loadavg like this: [root@server:/root]$ cat /proc/loadavg 0.45 0.73 0.82 1/4 25022

Presumably this strange value lead to atop hung in atop.c (960) loop. Unfortunately, I don't have access to the machine and can not provide details of such a strange value but the whole algorithm of calculating curtlen does not look reliable:

photoproc() is likely to consume significantly more time than counttasks() does. If there is a host that creates and terminates threads (processes) intensely then there is big chance that: 1) a task would be created after counttasks() made an assumption; 2) another task would be terminated after it's been enumerated by photoproc() so the total count of tasks does not change but there are more tasks than predicted by counttasks() (plus at least 1 dead task); 3) I can imaging a situation when this will happen forever (or at least for a long time) making atop either hang or consume CPU in this loop.

Since the biggest memory footprint of atop is during peak system loads, why curtlen is reevaluated every interval? I think there is nothing bad in having this variable 1) nondecreasing; 2) incremented by some constant or a fraction (+10% ?) every time photoproc() doesn't fit to curtlen elements. This may reduce CPU consumption at a cost of slightly underused memory (but only when the system usage is low so there are plenty of resources free).

Atoptool commented 1 year ago

The counttask() function calculates a worst case value for the number of tasks. It takes the total number of threads plus the number of processes. Only for multi-threaded processes, an entry is needed for each thread. For single-threaded processes the thread entry is not used although it has been reserved in memory. In my system 376 processes are available with 571 threads in total, so 947 entries are reserved while only 781 are used (apparently 166 processes are single-threaded). In other words: the theoretical maximum that is calculated by counttask() will only be reached when all processes are multithreaded.

Obviously calculations will fail when the kernel delivers wrong values, like the 4 threads in the /proc/loadavg file. To minimise the chance of a loop in atop a consistency check has been added.

gitranjithkk commented 1 year ago

When can we expect this changes to be released and available at https://atoptool.nl/downloadatop.php

glangeveld commented 1 year ago

I expect to release a new version in april/may 2023.