Performance with long histories could be better

SuperSandro2000 commented 2 years ago

ble version: 0.4.0-devel3+43be0e4 Bash version: 5.1.16(1)-release (x86_64-pc-linux-gnu)

I have a history with over 64000 entries which I suspect to slow down blesh quite a bit.

I tried some poor man benchmarking with the following commands:

$ export PS4='[${SECONDS}] ${BASH_SOURCE//$HOME/\~}:${LINENO}: ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'
$ bleopt debug_xtrace=blesh
$ command grep -E "^\[\[\[" blesh2 | awk '{ print $3 }' | sort | uniq -c | sort -nr | less
1351381 ble/builtin/history/option:s():
  39180 ble/canvas/trace/.impl():
  21282 ble/history/isearch-backward-blockwise():
  15735 ble/color/read-sgrspec():
   9781 ble/string#split-words():
   8671 ble/string#split():
   8640 ble/color/read-sgrspec/.arg-next():
   7560 ble/unicode/GraphemeCluster/match():
   6900 ble/canvas/trace/.implicit-move():
   5994 ble/syntax/vanishing-word/register():
   5640 ble/canvas/trace/.ps1rc():
....

$ less blesh
.....
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/string#split(): set +f
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): for pat in "${pats[@]}"
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ less blesh2  == ls ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): for pat in "${pats[@]}"
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ less blesh2  == cd [..|-|~] ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local histfile=
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ -n 1 ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ -n ignoreboth:erasedups ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local ignorespace ignoredups erasedups spec
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): for spec in ${HISTCONTROL//:/ }
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): case "$spec" in
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): ignorespace=1
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): ignoredups=1
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): for spec in ${HISTCONTROL//:/ }
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): case "$spec" in
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): erasedups=1
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ -n 1 ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ less blesh2  == [\ \        ]* ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ -n 1 ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local lastIndex=64329
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( lastIndex>=0 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ less blesh2  == \b\l\e\o\p\t\ \d\e\b\u\g\_\x\t\r\a\c\e\=\b\l\e\s\h\2 ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ -n 1 ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): delete_indices=()
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local -a delete_indices
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local shift_histindex_next=0
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local shift_wskip=0
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local i N=64330
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i=0 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i<N-1 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ rm ~/.local/share/atuin/last_sync_time == \l\e\s\s\ \b\l\e\s\h\2\  ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i++ ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i<N-1 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ atuin-empty-history == \l\e\s\s\ \b\l\e\s\h\2\  ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i++ ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i<N-1 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ rm ~/.local/share/atuin/session == \l\e\s\s\ \b\l\e\s\h\2\  ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i++ ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i<N-1 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ vim ~/.local/share/atuin/key == \l\e\s\s\ \b\l\e\s\h\2\  ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i++ ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i<N-1 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ echo "Co-authored-by: $line" == \l\e\s\s\ \b\l\e\s\h\2\  ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i++ ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i<N-1 ))
.....

I suspect that blesh iterates over every command in my history on every command execution which is quite slow.

akinomyoga commented 2 years ago

Yes, ble.sh needs to scan the entire history when the user has erasedups in HISTCONTROL. Maybe we can consider processing the duplicate entries using awk (when the selected awk implementation supports RS="\0"), but I'm not sure if it improves the performance because the reading of the awk output also takes some time. We need careful benchmarking of several different implementations.

For a temporary workaround, you may drop erasedups from HISTCONTROL.

akinomyoga commented 2 years ago

FYI, this is the loop to be optimized:

https://github.com/akinomyoga/ble.sh/blob/43be0e4d4fccfcb172a891684878eb65b02cdb2e/src/history.sh#L1382-L1390

akinomyoga commented 2 years ago

I have measured the time. When there is one duplicate, the above loop consumes 53.8% of the time of the call of ble/builtin/history/option:s with 100k mock entries. The following two lines consume 44.0% of the time:

https://github.com/akinomyoga/ble.sh/blob/43be0e4d4fccfcb172a891684878eb65b02cdb2e/src/history.sh#L1392-L1393

This line consumes 2.0% of the time:

https://github.com/akinomyoga/ble.sh/blob/43be0e4d4fccfcb172a891684878eb65b02cdb2e/src/history.sh#L1394

SuperSandro2000 commented 2 years ago

I have the following awk version

awk --version
GNU Awk 5.1.1, API: 3.1

I hope that would be new/good enough and it is compiled with the right options. If not I can also change that very easily.

For a temporary workaround, you may drop erasedups from HISTCONTROL.

Thanks for the hint. That would probably work for me, too since I am manually deduping my history anyway. (I think erasedups does not work well with multiple terminals and having dups quite far in the past). It would only be a bit inconvenient if the recent history wasn't deduped.

Thanks for the fast responses! I really appreciate it!

Edit: After removing erasedups my shell feels quite fast again. Not as fast as completely clean and hitting enter on an empty line still has slightly more delay then normal but not as bad as it was before. Thanks again.

akinomyoga commented 2 years ago

I implemented the loop by awk (96e9bf82 944d48e), but it doesn't seem to significantly improve the performance. I have also considered other implementations, but there doesn't seem to be a better solution (unless we write a C program and compile it to process it).

At the moment, the time by the implementation by `awk` for the history of the size 100k is approximately half of the original naive `for` loop. The implementation` by `awk` can be roughly decomposed into the following three steps: ```bash # schematic code printf '%s\n' "${old_array[@]}" | # Step 1) export data from Bash awk '{ "[... process ...]"; }' | # Step 2) process by awk mapfile -t new_array # Step 3) import the result into Bash ``` The current bottleneck is step 1. With Bash-5.1, the naive implementation of step 1 `printf '%s\0' "${old_array[@]}"` already takes more than half of the time of the original `for`-loop. With Bash-5.2, it is actually slower than the original `for`-loop. Another approach is `declare -p old_array | awk '{ "[ parse declare -p ]"; }'` for step 2, which has a similar speed with `printf '%s\0' "${old_array[@]}"`. For this approach, parsing the output of `declare -p` by `awk` takes still some time, where `gawk` is 3x slower than `printf '%s\0'` and `mawk` is faster than `printf '%s\n` by 10-20%. There are still other switching and approaches of the implementation. The current implementation of `erasedups` tries to choose an optimal approach based on the Bash version, the available `awk` implementations in the system, and the current history size. The best combination for a large history is `mawk` & Bash 5.2 (not yet officially released).

(I think erasedups does not work well with multiple terminals and having dups quite far in the past). It would only be a bit inconvenient if the recent history wasn't deduped.

Since it doesn't seem to improve the performance as far as we process the entire history for erasedups, in commit e4afb5a, I have added an option bleopt history_erasedups_limit to limit the range of the history that undergoes the erasedups checks. With the default setting history_erasedups_limit=0, only the items that have been added in the present session are the target of erasedups checks. I think this works in most cases and also solves the subtlety with the multiple sessions and rewriting of the past history.

SuperSandro2000 commented 2 years ago

Feels a bit snappier already, thanks!

akinomyoga commented 2 years ago

Thank you!

Maybe I have confused you, but with the latest version of ble.sh, the behavior of erasedups is changed from that of Bash (unless you intentionally set bleopt history_erasedups_limit=). Now, only the entries added in the same session are searched for erasedups, so the time consumed by erasedups is currently negligible. It is at least less than a millisecond within my environment for 100k entries. If there is still a lag, I think that would be caused by other parts of ble.sh.

SuperSandro2000 commented 2 years ago

the behavior of erasedups is changed from that of Bash (unless you intentionally set bleopt history_erasedups_limit=). Now, only the entries added in the same session are searched for erasedups

Thats not a problem. I needed to manually dedupe my history anyway because if history syncing. I am using the following snippet to do that.

history-dedupe () {
    wc -l -- "$HISTFILE";
    tac -- "$HISTFILE" | awk '!x[$0]++' | tac | sponge -- "$HISTFILE";
    wc -l -- "$HISTFILE";
    history -c;
    history -r
}

If there is still a lag, I think that would be caused by other parts of ble.sh.

As you probably remember my setup is a bit complex and has a lot of moving parts. Without performance tracking of each bigger function and hook I don't think I can easily pin point the issue.

akinomyoga commented 2 years ago

Without performance tracking of each bigger function and hook I don't think I can easily pin point the issue.

I have supported profiling in commit f629698. The profiler can be activated by

$ ble/debug/profiler/start NAME

where NAME is used for a prefix of output filenames. The profiler can be stopped by

$ ble/debug/profiler/end

The files NAME.func.txt and NAME.line.txt will be generated (or updated if they exist already). The files can be checked by, e.g.,

$ less -RS NAME.func.txt                     # -S for long line, -R for color codes
$ sort -nrk6 NAME.func.txt | less -RS        # sort by column 6

Also, there is a new option bleopt debug_xtrace_ps4. This value is used for PS4 for xtrace of the internal processing of ble.sh. You have directly specified PS4 in the original post, but it doesn't work anymore with the latest version of ble.sh. Instead bleopt debug_xtrace_ps4=... can be used to specified PS4 for internal ble.sh.

The options bleopt debug_xtrace and bleopt debug_xtrace_ps4 are used by ble/debug/profiler, so they should not be changed by the user when ble/debug/profiler is activated.

SuperSandro2000 commented 2 years ago

Thank you very much!!

I now have some ideas where to look for improvements

     304        0        0  10826.957 16.36%  10826.957 16.36%      0.000    109.084    109.084      0.000 wezterm (/home/nixos/.bash_aliases)
    2283      304      304  10158.872 15.35%  10146.311 15.33%     12.561    192.012    192.012      0.000 _sbp_set_prompt (/home/nixos/.local/share/sbp/sbp.bash)
    9243        0        0   7922.842 11.97%   7922.842 11.97%      0.000     32.517     32.517      0.000 type (/home/nixos/.bash_aliases)
  100827    65100   253932  15382.238 23.24%   3654.036 5.520%  11728.202     22.428      0.240     22.188 ble/canvas/trace/.impl (/home/nixos/.local/share/blesh/ble.sh)
   25038        0        0   2270.985 3.431%   2270.985 3.431%      0.000      4.002      4.002      0.000 ble/string#split (/home/nixos/.local/share/blesh/ble.sh)
      55       10       10   1808.791 2.733%   1808.421 2.732%      0.370    809.434    809.434      0.000 __fzf_history__ (/home/nixos/.nix-profile/share/fzf/key-bindings.bash)
   19374        0        0   1774.980 2.681%   1774.980 2.681%      0.000      3.645      3.645      0.000 ble/string#split-words (/home/nixos/.local/share/blesh/ble.sh)
   45801    49203    52668   4910.018 7.418%   1697.583 2.565%   3212.435     10.717      0.062     10.655 ble/color/read-sgrspec (/home/nixos/.local/share/blesh/ble.sh)
    9330        0        0   1507.550 2.277%   1507.550 2.277%      0.000      3.796      3.796      0.000 ble/arithmetic/sum (/home/nixos/.local/share/blesh/ble.sh)
     608      304      304   1174.213 1.774%   1154.985 1.745%     19.228     10.444     10.261      0.183 _atuin_precmd (/home/nixos/.bashrc)
   23810    52393   620248  54470.389 82.29%    981.706 1.483%  53488.683    667.423      0.457    666.966 ble/textarea#render (/home/nixos/.local/share/blesh/ble.sh)
   22176        0        0    682.019 1.030%    682.019 1.030%      0.000      0.826      0.826      0.000 ble/color/read-sgrspec/.arg-next (/home/nixos/.local/share/blesh/ble.sh)
   17489    17449   424790  46707.926 70.56%    681.222 1.029%  46026.704    391.817      0.141    391.676 ble/prompt/update (/home/nixos/.local/share/blesh/ble.sh)

Do you have any idea why the following function is at the top? It seems very basic to me and should match the first if at all time.

wezterm() {
  if [[ -e /etc/NIXOS ]]; then
    command wezterm "$@"
  else
    if type -P nixGL &>/dev/null; then
      nixGL wezterm "$@"
    else
      if type -P nixGLIntel &>/dev/null; then
        nixGLIntel wezterm "$@"
      fi
    fi
  fi
}

It is used by https://github.com/wez/wezterm/blob/main/assets/shell-integration/wezterm.sh#L396-L403 which should be executed once per command run.

updated if they exist already

Amended or replaced? When using the option multiple times without clearing I got totally different results compared to when deleting the files between tries.

akinomyoga commented 2 years ago

Do you have any idea why the following function is at the top? It seems very basic to me and should match the first if at all time.
wezterm() {
  if [[ -e /etc/NIXOS ]]; then
    command wezterm "$@"
  else
    if type -P nixGL &>/dev/null; then
      nixGL wezterm "$@"
    else
      if type -P nixGLIntel &>/dev/null; then
        nixGLIntel wezterm "$@"
      fi
    fi
  fi
}

Spawning external commands generally has a larger cost than the builtin commands.

For this function, the time is on average 10826.957/304 = 35.615 msec, and the maximum time for a single call is 109.084 msec. These values seem to be a bit heavier than the normal spawning case, so just wezterm might be slow. Hmm, could you tell what is the spawning cost in your system?

$ while date; do :; done | uniq -c
    507 2022年  6月 30日 木曜日 04:36:32 JST
    585 2022年  6月 30日 木曜日 04:36:33 JST
    280 2022年  6月 30日 木曜日 04:36:34 JST
    438 2022年  6月 30日 木曜日 04:36:35 JST
    480 2022年  6月 30日 木曜日 04:36:36 JST
    477 2022年  6月 30日 木曜日 04:36:37 JST
    475 2022年  6月 30日 木曜日 04:36:38 JST
    542 2022年  6月 30日 木曜日 04:36:39 JST
    545 2022年  6月 30日 木曜日 04:36:40 JST
    539 2022年  6月 30日 木曜日 04:36:41 JST
    536 2022年  6月 30日 木曜日 04:36:42 JST

# kill it by pressing C-c

The fork cost of my system for date is about 1000/500 = 2 msec which is significantly smaller than the reported wezterm spawning.

It is used by https://github.com/wez/wezterm/blob/main/assets/shell-integration/wezterm.sh#L396-L403 which should be executed once per command run.

Hmm, I guess you should always use the fallback version as far as PWD contains only ASCII characters. Or we may request another wezterm-specific terminal sequence to invoke set-working-directory. Communicating through tty is much faster than communicating by creating a process.

Also, possibly this function __wezterm_osc7 can be called only when the current working directory has been changed (instead of running it every time) like using blehook CHPWD?

updated if they exist already

Amended or replaced? When using the option multiple times without clearing I got totally different results compared to when deleting the files between tries.

Amended. The results are summed up. When the profiling is performed multiple times for the same NAME, the final result of the files contains the sum of the times and counts in multiple profiling sessions.

SuperSandro2000 commented 2 years ago

Hmm, could you tell what is the spawning cost in your system?

Not that different from yours:

 ▶ while date; do :; done | uniq -c
     17 Thu Jun 30 11:15:33 AM CEST 2022
    545 Thu Jun 30 11:15:34 AM CEST 2022
    507 Thu Jun 30 11:15:35 AM CEST 2022
    460 Thu Jun 30 11:15:36 AM CEST 2022
    489 Thu Jun 30 11:15:37 AM CEST 2022
    519 Thu Jun 30 11:15:38 AM CEST 2022
    495 Thu Jun 30 11:15:39 AM CEST 2022
    562 Thu Jun 30 11:15:40 AM CEST 2022
    589 Thu Jun 30 11:15:41 AM CEST 2022
    570 Thu Jun 30 11:15:42 AM CEST 2022
    589 Thu Jun 30 11:15:43 AM CEST 2022
    574 Thu Jun 30 11:15:44 AM CEST 2022
    570 Thu Jun 30 11:15:45 AM CEST 2022
    564 Thu Jun 30 11:15:46 AM CEST 2022
    559 Thu Jun 30 11:15:47 AM CEST 2022
    536 Thu Jun 30 11:15:48 AM CEST 2022
    524 Thu Jun 30 11:15:49 AM CEST 2022
    486 Thu Jun 30 11:15:50 AM CEST 2022
    522 Thu Jun 30 11:15:51 AM CEST 2022

Also, possibly this function __wezterm_osc7 can be called only when the current working directory has been changed (instead of running it every time) like using blehook CHPWD?

This helped. Thanks

Now the only problematic line left that isn't the fzf_history bind which blocks when you search is

2283      304      304  10158.872 15.35%  10146.311 15.33%     12.561    192.012    192.012      0.000 _sbp_set_prompt (/home/nixos/.local/share/sbp/sbp.bash)

I am not sure how to improve that. SBP spawns a new bash and executes quite some functions in it. https://github.com/brujoand/sbp/blob/master/sbp.bash#L67

I probably try to play around with that in the next day if I could convert that into a source or only source a function once.

akinomyoga commented 2 years ago

Thank you! This means that wezterm itself is slow. Yeah, I think we can improve the performance of SBP by making it work in the same process, though it might require some structural changes.

akinomyoga commented 1 year ago

The upstream SBP doesn't seem to be active enough. Maybe we can make a fork of SBP as a "Source" repository (in the GitHub terminology) and perform the suggested structural change for performance improvements.

SuperSandro2000 commented 1 year ago

I have noticed that SBP is stale, too but I am not sure how much I could contribute to that. As you probably know I also have multiple additions for it open for a longer time.

akinomyoga / ble.sh

Performance with long histories could be better #198