aristocratos / bashtop

Linux/OSX/FreeBSD resource monitor
Apache License 2.0
10.79k stars 553 forks source link

crash on "Division by zero" error #6

Closed letompouce closed 4 years ago

letompouce commented 4 years ago

bash from Debian Buster: GNU bash, version 5.0.3(1)-release (x86_64-pc-linux-gnu)

Started with and without .bashrc (bash --norc):

New instance of /usr/local/bin/bashtop Pid: 3983
01:00:00  ERROR: On line 611 01:00:00  ERROR: On line 615 01:00:00  ERROR: On line 186 /usr/local/bin/bashtop: line 1538: ( ( 829-829 ) * 1000 * 1000 ) / ( cpu[hz]*time_elapsed*cpu[threads] ) : division by 0 (error token is "( cpu[hz]*time_elapsed*cpu[threads] ) ")
aristocratos commented 4 years ago

Does it crash instantly or sporadically? Do you have latest version? If so, can you post the output of "lscpu" command.

letompouce commented 4 years ago

Ahah I'm so failing at bug report!

It crashes "quite" instantly. Like I can see the boxes being drawn for a microsecond, then it vanishes instanly.

bashtop git clone at 8a0e03f.

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       39 bits physical, 48 bits virtual
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               61
Model name:          Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz
Stepping:            4
CPU MHz:             1402.918
CPU max MHz:         2900.0000
CPU min MHz:         500.0000
BogoMIPS:            4589.40
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            3072K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap intel_pt xsaveopt dtherm ida arat pln pts
aristocratos commented 4 years ago

I'm having trouble replicating the error so far. Would you mind also posting the output of "shopt" command.

The first error that occurs is in the second command of the following:

get_value -v cpu[threads] -sv "lscpu_var" -k "CPU(s):" -i

get_value -v cpu[cores] -sv "lscpu_var" -k "Core(s)*:" -i

I'm trying to figure out if it has something to do with the wildcard in the second command since the first seems to be working fine.

Edit: Pushed a possible fix 62d70751d5a4379ff0b8e3ea26d0120d36ece51a

letompouce commented 4 years ago

shopt:

autocd                   off
assoc_expand_once        off
cdable_vars              off
cdspell                  off
checkhash                off
checkjobs                off
checkwinsize             on
cmdhist                  on
compat31                 off
compat32                 off
compat40                 off
compat41                 off
compat42                 off
compat43                 off
compat44                 off
complete_fullquote       on
direxpand                off
dirspell                 off
dotglob                  off
execfail                 off
expand_aliases           on
extdebug                 off
extglob                  on
extquote                 on
failglob                 off
force_fignore            on
globasciiranges          on
globstar                 off
gnu_errfmt               off
histappend               on
histreedit               off
histverify               on
hostcomplete             off
huponexit                off
inherit_errexit          off
interactive_comments     on
lastpipe                 off
lithist                  off
localvar_inherit         off
localvar_unset           off
login_shell              on
mailwarn                 off
no_empty_cmd_completion  off
nocaseglob               off
nocasematch              off
nullglob                 off
progcomp                 on
progcomp_alias           off
promptvars               off
restricted_shell         off
shift_verbose            off
sourcepath               on
xpg_echo                 off

Blind guess, I found out that bashtop runs fine from a debian:10-slim Docker image (but lacks of data, unprivilegied container). I don't find anything relevant here but still here it is:

$ diff --unified=0 shopt-from-host shopt-dockerdebian10slim
--- shopt-from-host       2020-04-08 00:11:34.856474823 +0200
+++ shopt-dockerdebian10slim    2020-04-08 00:11:52.280322109 +0200
@@ -23 +23 @@
-extglob                on
+extglob                off
@@ -30 +30 @@
-histappend             on
+histappend             off
@@ -32,2 +32,2 @@
-histverify             on
-hostcomplete           off
+histverify             off
+hostcomplete           on
@@ -41 +41 @@
-login_shell            on
+login_shell            off
@@ -49 +49 @@
-promptvars             off
+promptvars             on

62d7075 doesn't fix ti.

aristocratos commented 4 years ago

Pushed another possible fix 8af367e

There is probably gonna be more errors if this fixes the error on line 615. But atleast you should get some other error messages and we might be able to narrow the source of the error down.

letompouce commented 4 years ago

error.log:

New instance of /usr/local/bin/bashtop Pid: 22842
14:50:23 ERROR: On line 613 
14:50:23 ERROR: On line 617 
14:50:23 ERROR: On line 188 
/usr/local/bin/bashtop: line 1540: ( ( 1338415-1338411 ) * 1000 * 1000 ) / ( cpu[hz]*time_elapsed*cpu[threads] ) : division by 0 (error token is "( cpu[hz]*time_elapsed*cpu[threads] ) ")
aristocratos commented 4 years ago

Just tested on a clean debian buster install with same bash version and got no errors. Would you mind looking through ~/.bashrc , ~/.profile , /etc/bash.bashrc , /etc/profile for any "shopt" or "set" commands or any sourced files you're unsure of or anything else you suspect might alter bash behaviour? Not sure what else to do since I can't replicate the issue.

letompouce commented 4 years ago

Oh, my own dotfiles have a gazillion commits from over 15 years, nothing I'm unsure of there :-)

But, as I said in the initial report, it happens without any .bashrc. I forgot to say that it was a temporary test account, so no special dotfiles tweaks at all. Only remain a few files in /etc/profile.d/ and /etc/bash_completion.d/ dropped there by Debian Stable packages.

I booted my laptop using a debian-live-10.3.0-amd64-xfce.iso LiveUSB, apt install git && git clone && ./bashtop and I can reproduce the issue - in a quite artistic manner: https://pix.toile-libre.org/upload/original/1586421232.jpg The broken screen and the dirt add some kind of poetry in it if you ask me.

aristocratos commented 4 years ago

Yeah, gonna look a bit funky in the console with standard font :) Downloaded the same image and tried in a vm, couldn't reproduce the errors you get, but fixed a couple of other bugs :P However, added some extended tracing options, would you mind upgrading to latest version and run "bashtop --debug" and then post tracing.log from config dir.

letompouce commented 4 years ago

I was looking at this 53k trace, and spotted a spelling error from the lscpu output: « Drapaux » should be « Drapeaux » (which means « Flags » in french). Then I realised that I was kind enough to provide you with a translated lscpu output in my reply above, and the tracing.log with french stuff in it might annoy you.

So I ran bashtop like LANG=en_US.UTF-8 bashtop --debug and voilà, it works and doesn't crash!

Looks like it is all about i18n! For you to compare, here are both versions of lscpu from my laptop:

$ LANG=fr_FR.UTF-8 lscpu
Architecture :                          x86_64
Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit
Boutisme :                              Little Endian
Tailles des adresses:                   39 bits physical, 48 bits virtual
Processeur(s) :                         4
Liste de processeur(s) en ligne :       0-3
Thread(s) par cœur :                    2
Cœur(s) par socket :                    2
Socket(s) :                             1
Nœud(s) NUMA :                          1
Identifiant constructeur :              GenuineIntel
Famille de processeur :                 6
Modèle :                                61
Nom de modèle :                         Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz
Révision :                              4
Vitesse du processeur en MHz :          798.413
Vitesse maximale du processeur en MHz : 2900,0000
Vitesse minimale du processeur en MHz : 500,0000
BogoMIPS :                              4589.49
Virtualisation :                        VT-x
Cache L1d :                             32K
Cache L1i :                             32K
Cache L2 :                              256K
Cache L3 :                              3072K
Nœud NUMA 0 de processeur(s) :          0-3
Drapaux :                               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap intel_pt xsaveopt dtherm ida arat pln pts
$ LANG=en_US.UTF-8 lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       39 bits physical, 48 bits virtual
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               61
Model name:          Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz
Stepping:            4
CPU MHz:             1706.798
CPU max MHz:         2900.0000
CPU min MHz:         500.0000
BogoMIPS:            4589.49
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            3072K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap intel_pt xsaveopt dtherm ida arat pln pts

And here's the requested tracing.log: https://gist.github.com/letompouce/06b5a8debe6aea3dde209f01e5119abc

aristocratos commented 4 years ago

Ah damn man, thought i fixed the external command output locale with declare LC_MESSAGES="C" LC_NUMERIC="C" and didn't think it would be a locale error since as you said, I got the lscpu output in english. Well atleast we found the problem :)

Ah shit, just noticed declare LC_MESSAGES="C" LC_NUMERIC="C" should be declare -x LC_MESSAGES="C" LC_NUMERIC="C" and everyting works fine...

Well, a big thanks for your patience in tracking it down :)

Fixed in commit 1fd6d0a