Open killermoehre opened 3 years ago
Oh, and on macOS, the locale is called en_US.UTF-8
, not en_US.utf8
.
Thanks for the report. Do you experience infinite loops in the tests or outside the ble tests? cat -A
is only used in tests so infinite loops or 100% CPUs outside the tests shouldn't be related to cat -A
. Actually, tests are primarily for the development in my environment and not carefully designed to work in every environment.
Oh, and on macOS, the locale is called
en_US.UTF-8
, noten_US.utf8
.
Hmm, I don't remember well but there has been an environment where en_US.UTF-8
doesn't work, which was the reason why en_US.utf8
is used instead of en_US.UTF-8
. Also, in GNU/Linux people use en_US.UTF-8
but en_US.utf8
is usually printed as the default locale name by locale -a
.
Now maybe I need to introduce the environment-dependent tests.
I understand that cat -A
is only used for test. Is there some way to profile the called scripts to check why multiple cores get used 100% by bash
processes?
Also, the
~/.local/share/blesh/./lib/test-util.sh:1186: [[ $BASHPID == $ppid ]]
--- 49459.exit.expect 2021-08-03 16:12:22.000000000 +0200
+++ 49459.exit.result 2021-08-03 16:12:22.000000000 +0200
@@ -1 +1 @@
-0
+1
~/.local/share/blesh/./lib/test-util.sh:1193: [[ $BASHPID == $ppid ]]
--- 49464.exit.expect 2021-08-03 16:12:22.000000000 +0200
+++ 49464.exit.result 2021-08-03 16:12:22.000000000 +0200
@@ -1 +1 @@
-0
+1
looks wrong. Maybe by not finding the proper parent PID some processes run into a loop?
Hmm, these tests also need to be fixed, but this shouldn't be related to the infinite loop in your environment because these are the tests for the shell function ble/util/getpid
which is only used in Bash 3.x. Actually, ble/util/getpid
is used for the polyfill for the shell special variable BASHPID
. It doesn't do anything in Bash 4+ because Bash 4+ already has the support of BASHPID
.
OK. Any way to profile/debug such loops?
I see in htop bash processes as sister to some gawk, running amok with some neverending loop.
Sorry for the long blank. Can you get the command-line argument of the gawk
process? It is useful information to identify the related section in the codebase.
Sorry for the delay. Here is some pstree
.
|-+- 28529 killermoehre -bash
| \-+- 28530 killermoehre -bash
| |--- 28531 killermoehre -bash
| \-+- 28533 killermoehre -bash
| \--- 28535 killermoehre /usr/local/bin/gawk -v AWKTYPE=gawk -v apos=' -v _ble_bash=50108 \012 BEGIN {\012 q = apos;\012 Q = apos "\\" apos apos;\012 reason = ENVIRON["reason"];\012 is_resolve = reason == "resolve";\012 TMPBASE = ENVIRON["TMPBASE"];\012 filename_source = TMPBASE ".part";\012 if (is_resolve)\012 print "builtin history -c" > filename_source\012 entry_nline = 0;\012 entry_text = "";\012 entry_time = "";\012 if (_ble_bash >= 40400)\012 entry_time = ENVIRON["epoch"];\012 command_count = 0;\012 multiline_count = 0;\012 modification_count = 0;\012 read_section_count = 0;\012 }\012 function write_flush(_, i, filename_section, t, c) {\012 if (command_count == 0) return;\012 if (command_count >= 2 || entry_time) {\012 filename_section = TMPBASE "." read_section_count++ ".part";\012 for (i = 0; i < command_count; i++) {\012 t = command_time[i];\012 c = command_text[i];\012 if (t) print "#" t > filename_section;\012 print c > filename_section;\012 }\012 print "HISTTIMEFORMAT=%s builtin history -r " filename_section > filename_source;\012 } else {\012 for (i = 0; i < command_count; i++) {\012 c = command_text[i];\012 gsub(/'/, Q, c);\012 print "builtin history -s -- " q c q > filename_source;\012 }\012 }\012 command_count = 0;\012 }\012 function write_complex(value) {\012 write_flush();\012 print "builtin history -s -- " value > filename_source;\012 }\012 function register_command(cmd) {\012 command_time[command_count] = entry_time;\012 command_text[command_count] = cmd;\012 command_count++;\012 }\012 function is_escaped_command(cmd) {\012 return cmd ~ /^eval -- \$'([^'\\]|\\[\\'nt])*'$/;\012 }\012 function unescape_command(cmd) {\012 cmd = substr(cmd, 11, length(cmd) - 11);\012 gsub(/\\\\/, "\\q", cmd);\012 gsub(/\\n/, "\n", cmd);\012 gsub(/\\t/, "\t", cmd);\012 gsub(/\\'/, "'", cmd);\012 gsub(/\\q/, "\\", cmd);\012 return cmd;\012 }\012 function register_escaped_command(cmd) {\012 multiline_count++;\012 modification_count++;\012 if (_ble_bash >= 40400) {\012 register_command(unescape_command(cmd));\012 } else {\012 write_complex(substr(cmd, 9));\012 }\012 }\012 function register_multiline_command(cmd) {\012 multiline_count++;\012 if (_ble_bash >= 40040) {\012 register_command(cmd);\012 } else {\012 gsub(/'/, Q, cmd);\012 write_complex(q cmd q);\012 }\012 }\012 function flush_entry() {\012 if (entry_nline < 1) return;\012 if (is_escaped_command(entry_text)) {\012 register_escaped_command(entry_text)\012 } else if (entry_nline > 1) {\012 register_multiline_command(entry_text);\012 } else {\012 register_command(entry_text);\012 }\012 entry_nline = 0;\012 entry_text = "";\012 }\012 function save_timestamp(line) {\012 if (is_resolve) {\012 if (line ~ /^ *[0-9]+\*? +__ble_time_[0-9]+__/) {\012 sub(/^ *[0-9]+\*? +__ble_time_/, "", line);\012 sub(/__.*$/, "", line);\012 entry_time = line;\012 }\012 } else {\012 if (line ~ /^#[0-9]/) {\012 sub(/^#/, "", line);\012 sub(/[^0-9].*$/, "", line);\012 entry_time = line;\012 }\012 }\012 }\012 {\012 if (is_resolve) {\012 save_timestamp($0);\012 if (sub(/^ *[0-9]+\*? +(__ble_time_[0-9]+__|\?\?)/, "", $0))\012 flush_entry();\012 entry_text = ++entry_nline == 1 ? $0 : entry_text "\n" $0;\012 } else {\012 if ($0 ~ /^#[0-9]/) {\012 save_timestamp($0);\012 next;\012 } else {\012 flush_entry();\012 entry_text = $0;\012 entry_nline = 1;\012 }\012 }\012 }\012 END {\012 flush_entry();\012 write_flush();\012 if (is_resolve)\012 print "builtin history -a /dev/null" > filename_source\012 print "multiline_count=" multiline_count;\012 print "modification_count=" modification_count;\012 }\012
\-+- 28534 killermoehre -bash
|--- 28536 killermoehre -bash
\-+- 28537 killermoehre -bash
\--- 28538 killermoehre /usr/local/bin/gawk -v AWKTYPE=gawk -v apos=' -v arg_offset=0 -v _ble_bash=50108 \012 function s2i_initialize() {\012 for (i = 0; i < 16; i++)\012 xdigit2int[sprintf("%x", i)] = i;\012 for (i = 10; i < 16; i++)\012 xdigit2int[sprintf("%X", i)] = i;\012 }\012 function s2i(s, base, _, i, n, r) {\012 if (!base) base = 10;\012 r = 0;\012 n = length(s);\012 for (i = 1; i <= n; i++)\012 r = r * base + xdigit2int[substr(s, i, 1)];\012 return r;\012 }\012 function c2s_initialize(_, i) {\012 if (sprintf("%c", 945) == "α") {\012 C2S_UNICODE_PRINTF_C = 1;\012 } else {\012 C2S_UNICODE_PRINTF_C = 0;\012 for (i = 1; i <= 255; i++)\012 c2s_byte2char[i] = sprintf("%c", i);\012 }\012 }\012 function c2s(code, _, leadbyte_mark, leadbyte_sup, tail) {\012 if (C2S_UNICODE_PRINTF_C)\012 return sprintf("%c", code);\012 leadbyte_sup = 128; # 0x80\012 leadbyte_mark = 0;\012 tail = "";\012 while (leadbyte_sup && code >= leadbyte_sup) {\012 leadbyte_sup /= 2;\012 leadbyte_mark = leadbyte_mark ? leadbyte_mark / 2 : 65472; # 0xFFC0\012 tail = c2s_byte2char[128 + int(code % 64)] tail;\012 code = int(code / 64);\012 }\012 return c2s_byte2char[(leadbyte_mark + code) % 256] tail;\012 }\012 function es_initialize(_, c) {\012 es_control_chars["a"] = "\a";\012 es_control_chars["b"] = "\b";\012 es_control_chars["t"] = "\t";\012 es_control_chars["n"] = "\n";\012 es_control_chars["v"] = "\v";\012 es_control_chars["f"] = "\f";\012 es_control_chars["r"] = "\r";\012 es_control_chars["e"] = "\033";\012 es_control_chars["E"] = "\033";\012 es_control_chars["?"] = "?";\012 es_control_chars[apos] = apos;\012 es_control_chars["\""] = "\"";\012 es_control_chars["\\"] = "\\";\012 for (c = 32; c < 127; c++)\012 es_s2c[sprintf("%c", c)] = c;\012 }\012 function es_unescape(s, _, head, c) {\012 head = "";\012 while (match(s, /^[^\\]*\\/)) {\012 head = head substr(s, 1, RLENGTH - 1);\012 s = substr(s, RLENGTH + 1);\012 if ((c = es_control_chars[substr(s, 1, 1)])) {\012 head = head c;\012 s = substr(s, 2);\012 } else if (match(s, /^[0-9]([0-9][0-9]?)?/)) {\012 head = head c2s(s2i(substr(s, 1, RLENGTH), 8) % 256);\012 s = substr(s, RLENGTH + 1);\012 } else if (match(s, /^x[0-9a-fA-F][0-9a-fA-F]?/)) {\012 head = head c2s(s2i(substr(s, 2, RLENGTH - 1), 16));\012 s = substr(s, RLENGTH + 1);\012 } else if (match(s, /^U[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]([0-9a-fA-F]([0-9a-fA-F][0-9a-fA-F]?)?)?/)) {\012 head = head c2s(s2i(substr(s, 2, RLENGTH - 1), 16));\012 s = substr(s, RLENGTH + 1);\012 } else if (match(s, /^[uU][0-9a-fA-F]([0-9a-fA-F]([0-9a-fA-F][0-9a-fA-F]?)?)?/)) {\012 head = head c2s(s2i(substr(s, 2, RLENGTH - 1), 16));\012 s = substr(s, RLENGTH + 1);\012 } else if (match(s, /^c[ -~]/)) {\012 c = es_s2c[substr(s, 2, 1)];\012 head = head c2s(_ble_bash >= 40400 && c == 63 ? 127 : c % 32);\012 s = substr(s, 3);\012 } else {\012 head = head "\\";\012 }\012 }\012 return head s;\012 }\012 BEGIN {\012 s2i_initialize();\012 c2s_initialize();\012 es_initialize();\012 INDEX_FILE = ENVIRON["INDEX_FILE"];\012 opt_null = ENVIRON["opt_null"];\012 opt_source = ENVIRON["opt_source"];\012 if (!opt_null && !opt_source)\012 printf("") > INDEX_FILE; # create file\012 n = 0;\012 hindex = arg_offset;\012 }\012 function flush_line() {\012 if (n < 1) return;\012 if (opt_null) {\012 if (t ~ /^eval -- \$'([^'\\]|\\.)*'$/)\012 t = es_unescape(substr(t, 11, length(t) - 11));\012 printf("%s%c", t, 0);\012 } else if (opt_source) {\012 if (t ~ /^eval -- \$'([^'\\]|\\.)*'$/)\012 t = es_unescape(substr(t, 11, length(t) - 11));\012 gsub(/'/, "'\\''", t);\012 print "_ble_history[" hindex "]=" apos t apos;\012 } else {\012 if (n == 1) {\012 if (t ~ /^eval -- \$'([^'\\]|\\.)*'$/)\012 print hindex > INDEX_FILE;\012 } else {\012 gsub(/['\\]/, "\\\\&", t);\012 gsub(/\n/, "\\n", t);\012 print hindex > INDEX_FILE;\012 t = "eval -- $" apos t apos;\012 }\012 print t;\012 }\012 hindex++;\012 n = 0;\012 t = "";\012 }\012 {\012 if (sub(/^ *[0-9]+\*? +(__ble_ext__|\?\?)/, "", $0))\012 flush_line();\012 t = ++n == 1 ? $0 : t "\n" $0;\012 }\012 END { flush_line(); }\012
@killermoehre Thank you for the information! This is very useful hint to narrow down the problem! The first gawk is called from the shell function ble/history:bash/resolve-multiline/.awk
(defined in src/history.sh
) and the second gawk is called from the shell function ble/history:bash/load/.background-initialize
(also defined in src/history.sh
). Both are related to the initialization of the command history. I have additional questions:
Q1: In the pstree
result that you have provided, there are two gawk instances. Which one is related to the Bash of 100% CPU usage? Or both gawks have accompanying Bash processes of 100% CPU usage? You have wrote
I see in htop bash processes as sister to some gawk, running amok with some neverending loop.
Can you tell me which bash in the above pstree
result uses 100% CPU? For example, are they PID 28531 and 28536?
Q2: These processes are supposed to be performed when ble.sh session starts. Does the 100% CPU usage happens only when the ble.sh session started? Or the CPU usage is always 100% even after a long time since the ble.sh session has started? If the CPU usage goes down after a while, how long does it take?
Q3: Does this 100% CPU usage always happens when ble.sh session has started? Or does it occur at a certain probability?
Q4: While Bash occupies 100% CPU, does the corresponding interactive shell respond? If it responds, how quick is the response? Is it normal or slower than usual?
Q5: When Bash occupies 100% CPU, how is the CPU usage of gawk processes? Do they also occupy a large fraction of CPU resources?
Q6: What is the result of the following command?
$ wc ~/.bash_history # Note: the number of lines in your command-history file
$ builtin history | wc
$ time ble/util/msleep 100
- Q1: In the
pstree
result that you have provided, there are two gawk instances. Which one is related to the Bash of 100% CPU usage? Or both gawks have accompanying Bash processes of 100% CPU usage? You have wroteI see in htop bash processes as sister to some gawk, running amok with some neverending loop.
Both gawk
processes have a 100% CPU process associated.
Can you tell me which bash in the above
pstree
result uses 100% CPU? For example, are they PID 28531 and 28536?
28531
and 28536
are the 100% CPU bash
processes.
- Q2: These processes are supposed to be performed when ble.sh session starts. Does the 100% CPU usage happens only when the ble.sh session started? Or the CPU usage is always 100% even after a long time since the ble.sh session has started? If the CPU usage goes down after a while, how long does it take?
The 100% happens if I try to get the last entry from the history after I start a new shell.
- Q3: Does this 100% CPU usage always happens when ble.sh session has started? Or does it occur at a certain probability?
Always.
- Q4: While Bash occupies 100% CPU, does the corresponding interactive shell respond? If it responds, how quick is the response? Is it normal or slower than usual?
It does not response. It hangs at loading history ...
.
- Q5: When Bash occupies 100% CPU, how is the CPU usage of gawk processes? Do they also occupy a large fraction of CPU resources?
No. Just around 5%.
- Q6: What is the result of the following command?
$ wc ~/.bash_history # Note: the number of lines in your command-history file $ builtin history | wc
$ wc ~/.bash_history
549711 2631751 25820094 /Users/c5219583/.bash_history
$ builtin history | wc
549712 3181467 30117812
- Q7: What is the result of the following command?
$ time ble/util/msleep 100
$ time ble/util/msleep 100
real 0m0,104s
user 0m0,000s
sys 0m0,000s
Additional, after entering a command and hitting Return
, it takes ages (several seconds) until the command is actually executed. Seems like something with the history loading? In the process description of the terminal title there is some gawk
process as well.
Thank you! Hmm, OK. Maybe that's because your history size is large. But I still see some behavior that I don't have an idea why it behaves in such a way.
It does not response. It hangs at
loading history ....
Q8: When it hangs, is the message loading history...
always shown?
Q9: If you reduces the size of the history, does the hanging time reduce as well? For example,
$ cd
$ cp .bash_history .bash_history.backup20210825 # Please make a backup! Please restore it after testing.
$ tail -10000 .bash_history.backup20210825 > .bash_history # Smaller size of history with 10k entries
$ bash # new session
Additional, after entering a command and hitting
Return
, it takes ages (several seconds) until the command is actually executed.
I guess this is another bottleneck that is related to the large history.
erasedups
in your $HISTCONTROL
? If it contains erasedups
, can you check if the above delay before the command execution goes away when you removed erasedups
from HISTCONTROL
?
- Q8: When it hangs, is the message
loading history...
always shown?
No, only when I try to access the history by using arrow_up
to get the last command. If I execute a command from the start of my history, it just hangs.
- Q9: If you reduces the size of the history, does the hanging time reduce as well? For example,
$ cd $ cp .bash_history .bash_history.backup20210825 # Please make a backup! Please restore it after testing. $ tail -10000 .bash_history.backup20210825 > .bash_history # Smaller size of history with 10k entries $ bash # new session
Yes, it is fast.
- Q10: Do you have
erasedups
in your$HISTCONTROL
? If it containserasedups
, can you check if the above delay before the command execution goes away when you removederasedups
fromHISTCONTROL
?
$ echo $HISTCONTROL
ignoreboth:erasedups
If I remove erasedups
everything is swooping and fast with the small history.
Here is the speed matrix:
large history | small history | |
---|---|---|
w/ erasedups |
slow | slow |
w/o erasedups |
slow | fast |
For the time being I will set HISTSIZE
and HISTCONTROL
accordingly.
Here is the content of the file setting the HISTSIZE
declare -x -i HISTSIZE=$((1024 * 1024 * 5))
declare -x -i HISTFILESIZE="$HISTSIZE"
Thank you for the information.
No, only when I try to access the history by using arrow_up to get the last command. If I execute a command from the start of my history, it just hangs.
I'm confused. What does "hangs" exactly mean? Does it mean it never responded no matter how long you waited (as far as you have tested)? Or does it just mean it takes some long time but you have observed the end of the processing?
I also have tested with a large history in my Linux. With the history size ~ 5M, it becomes slow but still doesn't hang. When I increased the size to about 10M, Bash run out the memory.
Can you tell me which bash in the above
pstree
result uses 100% CPU? For example, are they PID 28531 and 28536?
28531
and28536
are the 100% CPUbash
processes.
I checked them in my Linux, and the Bash processes in that position of the process tree actually just execute the command builtin history
. If the above information is correct, it means that builtin history
takes somehow long time.
$ time builtin history >/dev/null
$ time builtin history > tmpfile
$ time builtin history | ble/bin/awk '{}'
What is the result in the plain Bash (without ble.sh loaded) (with the large history)?
$ bash --norc # plain Bash
$ time history > /dev/null
$ time history > tmpfile
$ func() { gawk '{}'; }; time history | func
$ builtin history | ble/bin/sleep 60
Here is the speed matrix:
large history small history w/ erasedups
slow slow w/o erasedups
slow fast
What are these slow and fast measure? I thought only the large history with erasedups should be slow, but it says slow even with small history with erasedups
. Since you have reported that it is several seconds with large history and erasedups
("it takes ages (several seconds) until the command is actually executed."), I expect it will become less than 10/50 sec = 0.2 sec with small history of the 1/50 size. What do these slows actually mean?
ble version: ble.sh (Bash Line Editor), version 0.4.0-devel3+a860769 Bash version: 5.1.8(1)-release (x86_64-apple-darwin20.3.0)
So, on macOS
cat
does not have the-A
option. This is probably why I see processes going up to 100 % CPU. Or it's a red herring.Maybe you find other problems. Nevertheless, ble.sh right now is super slow on macOS. I hope I can help you to fix this. I see in
htop
bash
processes as sister to somegawk
, running amok with some neverending loop.