Atoptool / atop

System and process monitor for Linux
GNU General Public License v2.0
789 stars 110 forks source link

Offline CPUs not handled well. #261

Open bexamous opened 1 year ago

bexamous commented 1 year ago

Eg with some CPUs offline (note cpu0 is always online):

# ls /sys/devices/system/cpu/cpu*/online | sort -V | xargs -l1 -t cat
cat /sys/devices/system/cpu/cpu1/online
0
cat /sys/devices/system/cpu/cpu2/online
0
cat /sys/devices/system/cpu/cpu3/online
0
cat /sys/devices/system/cpu/cpu4/online
0
cat /sys/devices/system/cpu/cpu5/online
0
cat /sys/devices/system/cpu/cpu6/online
0
cat /sys/devices/system/cpu/cpu7/online
0
cat /sys/devices/system/cpu/cpu8/online
0
cat /sys/devices/system/cpu/cpu9/online
0
cat /sys/devices/system/cpu/cpu10/online
0
cat /sys/devices/system/cpu/cpu11/online
0
cat /sys/devices/system/cpu/cpu12/online
0
cat /sys/devices/system/cpu/cpu13/online
0
cat /sys/devices/system/cpu/cpu14/online
0
cat /sys/devices/system/cpu/cpu15/online
0
cat /sys/devices/system/cpu/cpu16/online
1
cat /sys/devices/system/cpu/cpu17/online
1
cat /sys/devices/system/cpu/cpu18/online
1
cat /sys/devices/system/cpu/cpu19/online
1
cat /sys/devices/system/cpu/cpu20/online
1
cat /sys/devices/system/cpu/cpu21/online
1
cat /sys/devices/system/cpu/cpu22/online
1
cat /sys/devices/system/cpu/cpu23/online
1

So CPU 0 and 16-23 are online, 9 cpus.

Runing atop -w atop.raw 1 to log for a bit and then looking at log:

 atop -PPRC,ALL -r atop.raw | grep ^cpu | head -n 20
cpu ncl-119 1684344973 2023/05/17 10:36:13 35280 100 0 12944 16380 2868 3488371 562 0 172 0 0 800 100 0 0
cpu ncl-119 1684344973 2023/05/17 10:36:13 35280 100 1 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344973 2023/05/17 10:36:13 35280 100 2 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344973 2023/05/17 10:36:13 35280 100 3 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344973 2023/05/17 10:36:13 35280 100 4 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344973 2023/05/17 10:36:13 35280 100 5 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344973 2023/05/17 10:36:13 35280 100 6 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344973 2023/05/17 10:36:13 35280 100 7 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344973 2023/05/17 10:36:13 35280 100 8 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344974 2023/05/17 10:36:14 1 100 0 2 1 0 97 0 0 0 0 0 800 100 110307360 46006462
cpu ncl-119 1684344974 2023/05/17 10:36:14 1 100 1 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344974 2023/05/17 10:36:14 1 100 2 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344974 2023/05/17 10:36:14 1 100 3 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344974 2023/05/17 10:36:14 1 100 4 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344974 2023/05/17 10:36:14 1 100 5 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344974 2023/05/17 10:36:14 1 100 6 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344974 2023/05/17 10:36:14 1 100 7 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344974 2023/05/17 10:36:14 1 100 8 0 0 0 0 0 0 0 0 0 0 100 0 0
cpu ncl-119 1684344975 2023/05/17 10:36:15 1 100 0 1 3 0 96 0 0 0 0 0 800 100 204152252 77791620
cpu ncl-119 1684344975 2023/05/17 10:36:15 1 100 1 0 0 0 0 0 0 0 0 0 0 100 0 0

Now it shows info about 9 CPUs... but CPU0-8, not CPU0,16-23. And with 1-8 offline, its all 0s.

And to note, eg PRC's current CPU column works correctly and will often has 16-23. The interactive view is also affected, eg this is with 'fixed' output, or whatever its called when you hit 'f'.. can see for CPU user is 804%.. but all the CPUs it shows are offline CPUs and 0%s.. and oddly get named cpu000.

ATOP - ncl-119                                                      2023/05/17  10:58:53                                                --f--------------                                                 1s elapsed
PRC | sys    0.21s  | user   7.10s  |               |               | #proc    382  | #trun     11  | #tslpi   264 |  #tslpu   171 |  #zombie    0 |  clones     0 |               |               |  #exit      0 |
CPU | sys      20%  | user    804%  | irq       0%  |               | idle     77%  | wait      0%  | steal     0% |               |  guest     0% |               |  ipc     2.16 |  cycl   27MHz |  curf   88MHz |
cpu | sys       0%  | user      0%  | irq       0%  |               | idle      0%  | cpu000 w  0%  | steal     0% |               |  guest     0% |               |  ipc     0.00 |  cycl    0MHz |             ? |
cpu | sys       0%  | user      0%  | irq       0%  |               | idle      0%  | cpu000 w  0%  | steal     0% |               |  guest     0% |               |  ipc     0.00 |  cycl    0MHz |             ? |
cpu | sys       0%  | user      0%  | irq       0%  |               | idle      0%  | cpu000 w  0%  | steal     0% |               |  guest     0% |               |  ipc     0.00 |  cycl    0MHz |             ? |
cpu | sys       0%  | user      0%  | irq       0%  |               | idle      0%  | cpu000 w  0%  | steal     0% |               |  guest     0% |               |  ipc     0.00 |  cycl    0MHz |             ? |
cpu | sys       0%  | user      0%  | irq       0%  |               | idle      0%  | cpu000 w  0%  | steal     0% |               |  guest     0% |               |  ipc     0.00 |  cycl    0MHz |             ? |
cpu | sys       0%  | user      0%  | irq       0%  |               | idle      0%  | cpu000 w  0%  | steal     0% |               |  guest     0% |               |  ipc     0.00 |  cycl    0MHz |             ? |
cpu | sys       0%  | user      0%  | irq       0%  |               | idle      0%  | cpu000 w  0%  | steal     0% |               |  guest     0% |               |  ipc     0.00 |  cycl    0MHz |             ? |
cpu | sys       0%  | user      0%  | irq       0%  |               | idle      0%  | cpu000 w  0%  | steal     0% |               |  guest     0% |               |  ipc     0.00 |  cycl    0MHz |             ? |
cpu | sys       8%  | user      3%  | irq       0%  |               | idle     89%  | cpu000 w  0%  | steal     0% |               |  guest     0% |               |  ipc     2.16 |  cycl  246MHz |  curf  799MHz |
CPL | avg1    1.50  |               | avg5    1.11  | avg15   0.85  |               |               | csw    21965 |               |  intr   11216 |               |               |  numcpu     9 |               |
MEM | tot    31.1G  | free    3.8G  | cache  24.5G  | dirty   2.3G  | buff  524.1M  | slab  900.4M  | slrec 660.9M |  shmem   6.4M |  shrss   0.0M |  shswp   0.0M |               |               |  numnode    1 |
SWP | tot     2.0G  |               | free    2.0G  | swcac   0.4M  |               |               |              |               |               |               |  vmcom   2.5G |  vmlim  17.5G |               |
PAG | scan       0  | steal      0  | stall      0  | compact    0  | numamig    0  | migrate    0  |              |               |               |               |  swin       0 |  swout      0 |  oomkill    0 |
PSI | cpusome  38%  | memsome   0%  | memfull   0%  |               | iosome    0%  | iofull    0%  | cs     5/1/2 |  ms     0/0/0 |  mf     0/0/0 |  is     0/0/0 |               |  if     0/0/0 |               |
DSK |      nvme0n1  | busy      0%  | read       0  | write      2  | discrd     0  | KiB/r      0  | KiB/w     56 |               |  KiB/d      0 |  MBr/s    0.0 |  MBw/s    0.1 |  avq     0.25 |  avio 2.00 ms |

(edit: formatting)