Atoptool / atop

System and process monitor for Linux
GNU General Public License v2.0
792 stars 109 forks source link

Floating Point exception in Infiniband code #279

Closed andretsb closed 9 months ago

andretsb commented 9 months ago

When running atop 2.9.0 on this system with two Infiniband interfaces (that have never been up), atop crashes with floating point exception.

(gdb) bt
#0  prisyst.constprop.0 (sstat=0x7f9dc3912010, nsecs=166497, avgval=<optimized out>, fixedhead=<optimized out>,
    highorderp=0x7ffd19d0401f "M0@\320\031\375\177", maxcpulines=<optimized out>, maxgpulines=999, maxdsklines=999, maxmddlines=999,
    maxlvmlines=999, maxintlines=999, maxifblines=999, maxnfslines=999, maxcontlines=999, maxnumalines=999, maxllclines=999,
    selp=<optimized out>, curline=<optimized out>) at /build/atop-A3aiul/atop-2.9.0/showlinux.c:2299
#1  0x0000555e302bd49a in text_samp (curtime=curtime@entry=1698547393, nsecs=nsecs@entry=166497,
    devtstat=devtstat@entry=0x555e302f7840 <devtstat>, sstat=sstat@entry=0x7f9dc3912010, nexit=nexit@entry=0,
    noverflow=noverflow@entry=0, flag=<optimized out>) at /build/atop-A3aiul/atop-2.9.0/showgeneric.c:497
#2  0x0000555e302c1bc4 in generic_samp (curtime=1698547393, nsecs=166497, devtstat=0x555e302f7840 <devtstat>, sstat=0x7f9dc3912010,
    nexit=<optimized out>, noverflow=<optimized out>, flag=1 '\001') at /build/atop-A3aiul/atop-2.9.0/showgeneric.c:145
#3  0x0000555e302b46cc in engine () at /build/atop-A3aiul/atop-2.9.0/atop.c:853
#4  0x0000555e302a79fc in main (argc=<optimized out>, argv=<optimized out>) at /build/atop-A3aiul/atop-2.9.0/atop.c:563

showlinux.c line 2299 is:

busy = (ival > oval ? ival : oval) *
             sstat->ifb.ifb[extra.index].lanes /
             (sstat->ifb.ifb[extra.index].rate * 10);

Atop version is 2.9.0 with commit 957ff648436fa4a6f08ad9a8c5ea856a5f33ef5b on kernel version is 6.5.9. Appears to only happen with -f

natoscott commented 9 months ago

@andretsb from your description it sounds like 'sstat->ifb.ifb[extra.index].rate' is sometimes zero - simplest fix may be to test for this condition and set 'busy = 0' if true, else perform the above calculation.

Atoptool commented 9 months ago

Solved by merging pull request #281