Pre-PR: [auto] completion performance improvements

Hi @akinomyoga :)

First, thank you for this software, it's very nice and useful, and BTW writing such a huge code base in pure bash and still manage to get it working properly is almost an artistic performance so congratulations xD.

I'm experiencing completion-related performance issues with ble.sh, both v0.3.2 and d84bcd8 (master). For me, completion becomes very slow (and is thus unusable) in two specific situations: dynamic command completion with a very large number of candidates (12438), and filename completion in a directory containing a very large number of files (57341).

I'm currently trying to fix these two issues starting from d84bcd8 (master). I will submit a PR when I'm done. In the meantime I wanted to open this issue to get your feedback.

For now I'm focused on the second case only: filenames. After some digging, I think the performance bottleneck is, for this specific case, located on this loop, on that loop, and also here.

Concerning the last line (pattern expansion), there is a huge performance regression in d84bcd8 (master) whereas on v0.3.2 it's mostly fine: the generated pattern is simpler (and way faster) in v0.3.2.

I'm experimenting with two new options:

$bleopt_complete_max_precandidates to abort completion as soon as we know that the number of candidates would be too high to be useful (right after pattern expansion). Then we could simply display "too much candidates". Fish is doing something similar.
$bleopt_complete_advanced_globbing to switch to simpler/faster globbing patterns (it could also be something like $bleopt_complete_globbing_type to switch between a very simple pattern, the pattern of v0.3.2, and that of master).

Maybe this could also help with #58 and #59.

To be continued...

EDIT I forgot to mention that pattern expansion is only slow with bleopt complete_ambiguous=1. Still, I think it would be nice to provide a simpler pattern option for the ambiguous case (like /*dir*/*file* instead of /*d*i*r*/*f*i*l*e*).

Hi @timjrd! First of all, thank you for your interest in this project and also getting interested in looking at its codebase! You are the first person who really dug into the code and expressed it to others :)

First, thank you for this software, it's very nice and useful, and BTW writing such a huge code base in pure bash and still manage to get it working properly is almost an artistic performance so congratulations xD.

Thank you very much for your encouraging words! Actually, the project started with just hundreds of lines. I didn't expect that the code becomes this large as I originally just wanted highlighting but not to implement a line editor.

I'm currently trying to fix these two issues starting from d84bcd8 (master). I will submit a PR when I'm done. In the meantime I wanted to open this issue to get your feedback.

OK! Questions are always welcome. If you have any questions you can ask them here.

Heavy loops in `source:file`

located on this loop, on that loop,

The fix for these loops is probably easy. Actually, ble.sh has a mechanism to cancel the completion generation when the user has inputs (see, e.g., bleopt complete_polling_cycle). The user input checking is performed in heavy loops but is not performed in the above loops because I didn't think it is heavy at the time I implemented it. You can just add the following line in these loops (just as in this loop):

((cand_iloop++%bleopt_complete_polling_cycle==0)) && ble/complete/check-cancel && return 148

Note that the local variable cand_iloop is defined in a caller scope.

I believe this should solve the problem because actually there are several heavier loops (with user input checking) in the entire candidate generation process but you haven't recognized them as bottlenecks. But, let's gauge it after we add the above line in these loops. If it does solve the problem, I think we don't have to add the option like bleopt_complete_max_precandidates. ~~I guess the reason that fish doesn't show too many candidates is not due to the performance issue but the display problem~~ (Edit I checked the behavior of fish. It actually does the right thing when there are so many candidates that don't fit into the terminal.)

Ambiguous pathname patterns

EDIT I forgot to mention that pattern expansion is only slow with bleopt complete_ambiguous=1. Still, I think it would be nice to provide a simpler pattern option for the ambiguous case (like /dir/file instead of /dir/file*).

The situation is a little bit complicated in ble.sh. In the ambiguous completion in ble.sh, it tries to perform three different levels of ambiguous completions: it tries substr matching (comp_type field m), hsubseq matching (a), and subseq matching (A). The substr matching is something like *file*, the hsubseq matching is something like f*i*l*e*, and the subseq matching is something like *f*i*l*e*. ble.sh tries these ambiguous matchings in this order and stop candidate generation when some candidates are first found. The corresponding code for this branching is here.

It is tedious to implement all of these different levels of ambiguous completion generation for each completion sources, so ble.sh uses the approach for ambiguous completion to first let the completion source generate all possible candidates and filter the candidates outside the source. But, of course, It is optional to pre-filter the candidates inside the source. In fact, source:file does a partial pre-filtering on pathname patterns in the function ble/complete/source:file/.construct-pathname-pattern. For substr (m) matching, it uses the common pattern with subseq (A) like *f*i*l*e*. I think you can here add another branch for substr (m) to generate the pattern like *file*.

Another thing to note is that I recently changed in the commit 71afaba6f6788c7f9020591437ab88343fe94e22 the pathname pattern *f*i*l*e* to *([!f])f*([!i])i*([!l])l*e* (where *( ... ) is an extglob construct) to work around the superlinear performance of Bash's glob engine.

However, if no candidates are found for the substr matching, it still falls to hsubseq matching or subseq matching. Maybe we can add another way to specify the enabled ambiguous completion matching. For example, we may extend bleopt complete_ambiguous to accept enabled ambiguous matching as e.g. bleopt complete_ambiguous=substr:hsubseq:subseq. What do you think?

Hi! Thanks for your detailed feedback!

You can just add the following line in these loops

It is slightly better with this check: the UI is more responsive as I can abort the completion earlier by moving the cursor.

If it does solve the problem, I think we don't have to add the option like bleopt_complete_max_precandidates.

I think such a short-circuit could still be useful and would provide a better/snappier user experience in some (extreme) cases like mine. Perhaps it would help with big histories too.

For example, when I do:

cd /some/huge/dir/<TAB>

Bash can easily max up one core for ~15 seconds before showing me the results: here there are more than 20k sub-directories, so it's perfectly understandable. But past a certain (large) number of candidates, I think it's useless to display them, IMO it would be better to abort completion and just display "too many candidates, be more precise".

In fact, tab-completion is not the biggest issue. The real problem is when auto-suggestion is enabled: if I know that some directory is big, I could refrain from using tab-completion on that directory, but auto-completion is still trying to show me something and Bash is going crazy in those cases.

To sum up, given the performance limitations of Bash, I think it's perfectly reasonable to give up on completion as soon as we know that the said completion cannot be achieved in a sufficiently short amount of time.

Another thing to note is that I recently changed in the commit 71afaba the pathname pattern *f*i*l*e* to *([!f])f*([!i])i*([!l])l*e* (where *( ... ) is an extglob construct) to work around the superlinear performance of Bash's glob engine.

I did some quick benchmarking of Bash globbing. Here is the script I used:

function glob {
  local res=()
  res=($1)
  ret="${#res[@]}"
}

function nglob {
  echo "$2"
  local n=$1
  local prev=
  while (($n)); do
    glob "$2"
    [[ -n $prev && $prev != $ret ]] && return 1
    prev=$ret
    n=$(($n-1))
  done
}

function ble/string#split {
  if [[ -o noglob ]]; then
    IFS=$2 builtin eval "$1=(\${*:3}\$2)"
  else
    set -f
    IFS=$2 builtin eval "$1=(\${*:3}\$2)"
    set +f
  fi
}

function simple {
  local path=$1
  local pattern= i=0
  local names; ble/string#split names / "$1"
  local name
  for name in "${names[@]}"; do
    ((i++)) && pattern=$pattern/
    if [[ $name ]]; then
      pattern=$pattern*$name*
    fi
  done
  [[ $pattern ]] || pattern="*"
  ret=$pattern
}

function v0.3.2 {
  local path=$1 fixlen=0
  local pattern= i=0 j
  local names; ble/string#split names / "$1"
  local name
  for name in "${names[@]}"; do
    ((i++)) && pattern=$pattern/
    if [[ $name ]]; then
      ret="${name::fixlen}"
      pattern=$pattern$ret*
      for ((j=fixlen;j<${#name};j++)); do
        ret="${name:j:1}"
        pattern=$pattern$ret*
      done
    fi
  done
  [[ $pattern ]] || pattern="*"
  ret=$pattern
}

function master {
  local path=$1 fixlen=0
  local pattern= i=0 j
  local names; ble/string#split names / "$1"
  local name
  for name in "${names[@]}"; do
    ((i++)) && pattern=$pattern/
    if [[ $name ]]; then
      ret="${name::fixlen}"
      pattern=$pattern$ret*
      for ((j=fixlen;j<${#name};j++)); do
        ret="${name:j:1}"
        if [[ $pattern == *\* ]]; then
          pattern=$pattern'([!'$ret'])'
        fi
        pattern=$pattern$ret*
      done
    fi
  done
  [[ $pattern ]] || pattern="*"
  ret=$pattern
}

shopt -s extglob

path=/nix/store/firefox
n=100

echo '$BASH_VERSION' "$BASH_VERSION"
echo "$path" "x$n"
echo

ret=
simple "$path"
time -p nglob $n "$ret" || echo FAILED
echo

ret=
v0.3.2 "$path"
time -p nglob $n "$ret" || echo FAILED
echo

ret=
master "$path"
time -p nglob $n "$ret" || echo FAILED
echo

And here are the results:

$BASH_VERSION 5.0.17(1)-release
/nix/store/firefox x100

/*nix*/*store*/*firefox*
real 2,10
user 1,10
sys 1,00

/*n*i*x*/*s*t*o*r*e*/*f*i*r*e*f*o*x*
real 2,21
user 1,14
sys 1,07

/*([!n])n*([!i])i*([!x])x*/*([!s])s*([!t])t*([!o])o*([!r])r*([!e])e*/*([!f])f*([!i])i*([!r])r*([!e])e*([!f])f*([!o])o*([!x])x*
real 133,10
user 132,07
sys 1,03

As you can see, the extglob pattern introduced in 71afaba6f is more than 60 times slower than that of v0.3.2.

Maybe we can add another way to specify the enabled ambiguous completion matching [...] What do you think?

As you said, I could add a dedicated substr implementation for filenames (ble.sh tries substr, then hsubseq, then subseq right?). Also, I think the new extglob pattern is too slow. Maybe we could revert to that of v0.3.2? Without the extglob pattern, I think that pattern expansion is fast enough, and that we don't need an option for that. However I do think that an option like $bleopt_complete_max_precandidates is still worthy xD.

Thanks!

Maximal number of candidates

Bash can easily max up one core for ~15 seconds

Oh, I see. I didn't come up with that fact. I usually don't care about the CPU usage so much for my personal use, but we should care about it.

However I do think that an option like $bleopt_complete_max_precandidates is still worthy xD.

OK, let's add the option! ...though I want to discuss the naming precandidates more.

I guess pre comes from the way how the ambiguous completion for pathnames is specifically implemented in ble.sh which is not visible to users. This is just a specific implementation detail. I'm kind of unwanting pre.
In the codebase, I used the word "candidate", but I'm actually not sure if this choice of a word is natural. I wanted to ask someone about this regularly. Bash doesn't call them "candidates" but just "completions". If you find completions more natural than candidates, you can switch to complete_max_completions (though the repetition of "complete" seems wordy).
Also, if this will be the option specific to source:file, we should not use such a general name. In this case, e.g. something like complete_max_pathnames might be better. If you want to make it a more general option, you can keep it as candidates (or completions). However, in this case, we need to think more about how to implement it in other sources. Also, I'm not sure whether the common threshold among different types of sources makes sense.

Ambiguous matching by extglob

I did some quick benchmarking of Bash globbing. Here is the script I used [...] As you can see, the extglob pattern introduced in 71afaba is more than 60 times slower than that of v0.3.2.

Yes. I can easily guess that the extglob should be slower in general by a constant factor. But the "worst" cases for the naive pattern construction by inserting just * is catastrophic: the processing time grows exponentially with the pattern length. For example, you can see the behavior of the Bash glob engine against bad cases in the following example:

pattern=a
for i in {1..20}; do
  echo -n "$pattern"
  time [[ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab == $pattern ]]
  pattern+=*a
done |& grep -Ev 'user|sys'

This problem is known as the super-linear performance or the exponential performance in the context of regular expressions. If the matching engine is properly implemented (e.g. using the DFA-based approach), the (true) regular language is always tested by the linear time with respect to the input length. But one can easily fall into a bad implementation with the super-linear performance if one naively implement the matching engine (specifically it becomes polynomial wrt input length and its power is proportional pattern length in the worst case). The implementation of Bash glob must be such a naive one.

Also, I think the new extglob pattern is too slow. Maybe we could revert to that of v0.3.2?

Even if it is slow, it's much better than hanging Bash. This slow matching cannot be canceled by scripts or by pressing C-c because it occurs in a primitive operation of Bash. As a result, the shell looks like completely hanging with a not-so-large length of patterns and inputs.

Also, I don't know how many files are in your /*n*i*x, but 133.10 / 100 = 1.3 sec (decimal point is a comma in your locale, right?) is slow but not too slow if there are a large number of files.

But maybe there is a way to improve the performance by even tweaking the patterns keeping the linear performance of the worst case. Do you have any idea?

`source:file` for `substr`

As you said, I could add a dedicated substr implementation for filenames

OK!

(ble.sh tries substr, then hsubseq, then subseq right?).

Right.

OK, let's add the option! ...though I want to discuss the naming precandidates more.

Sure, "precandidates" is a bit awkward. Maybe we could name it complete_early_abort? Where an empty value means "no early abort", and where a non-empty value would be a positive integer and defined as:

A hint on the maximum acceptable size of any data structure generated during the completion process, beyond which the completion may be prematurely aborted to avoid excessive processing time.

For filenames, this check would happen right after pattern expansion. Ideally it should be applied everywhere as soon as we know that a generated input is too big. Maybe I could apply the check on each source right after some input is generated, and abort completion using an exit code like return 148 for ble/complete/check-cancel?

Also I think it would be better to enable this option by default, with a fairly large value. I think that the average user could be discouraged from using ble.sh if they face a situation where completion is too resource-hungry and simply conclude that "it's too slow".

you can see the behavior of the Bash glob engine against bad cases in the following example

So by using a tricky pattern you trigger an alternative glob engine (ie. a proper regexp engine)?

Actually, this issue seems to have been fixed on Bash 5.0.17 as I can run your example almost instantaneously (even with {1..1000}). However, Bash 4.4.12 is struggling as you described. I will compile the last 4.* version of Bash and the first 5.* to see if the performance improvement came with version 5. Maybe we could check the Bash version and revert to v0.3.2 behavior for hsubseq and subseq only if we run on Bash 5+ ?

Maybe we could name it complete_early_abort? Where an empty value means "no early abort", and where a non-empty value would be a positive integer and defined as:

Maybe I'm repeating the discussion of pre, but I feel that we don't need early here. Users won't know whether the abort occurred early or late in the completion process. There are no essential differences in the visble behavior even if the limit check is performed in a later stage. I feel early just represents the implementation detail of how the limit is implemented so is redundant.

A hint on the maximum acceptable size of any data structure generated during the completion process, beyond which the completion may be prematurely aborted to avoid excessive processing time.

OK, as you assume such general usage, how about complete_limit? There are already a few options with the name limit (e.g. line_limit_type) for the user experiences.

Anyway, the option name can be anytime replaced before the merge, so you can start working on it now.

Maybe I could apply the check on each source right after some input is generated, and abort completion

I think the check needs not necessarily be performed inside the source. We can check the number of candidates just after the call of the source. Only when there is a bottleneck inside the source, one may additionally limit the number inside the source.

and abort completion using an exit code like return 148 for ble/complete/check-cancel?

The exit status 148 is primarily used to represent "there are user inputs" in ble.sh and causes the entire completion to abort. In the case that a specific source hits the limit, the other sources may still provide a decent number of candidates. Can you just return the function without yielding candidates? You can return the exit code 1 though currently it is not used in ble.sh.

So by using a tricky pattern you trigger an alternative glob engine (ie. a proper regexp engine)?

FWIW, the engine is the same but the tricky pattern suppresses the backtracking by specifying more deterministic glob patterns. Glob patterns can be regarded as another way of describing a regular language (just like the regular expressions). There can be several ways to specify an equivalent language by the glob pattern: e.g. a*b and a*([!b])b is equivalent in the sense that both accept exactly the same set of strings. Each pattern corresponds to some automatons. In general, such automatons are non-deterministic finite automatons (NFA) but can be always converted to deterministic one (DFA). NFA has the problem of the super-linear complexity when implemented naively while DFA is not suffered. a*(![b])b is not totally deterministic but more deterministic so that it can evade the performance problem.

Actually, this issue seems to have been fixed on Bash 5.0.17 as I can run your example almost instantaneously (even with {1..1000}). However, Bash 4.4.12 is struggling as you described.

Ah, OK. I forgot that. That was actually written in the corresponding development log (it's written in Japanese, so I'm not requesting others to check these logs). Sorry about forgetting to tell you that.

Maybe we could check the Bash version and revert to v0.3.2 behavior for hsubseq and subseq only if we run on Bash 5+ ?

OK. That's a good idea. You can test against the variable _ble_bash which contains the Bash version information in a form e.g. 40412.

OK! This exchange was very insightful. I think I have enough information to produce a meaningful patch. I may take some time to do it but I will definitely submit a PR!

OK! Thank you! I'm looking forward to seeing the PR!

akinomyoga / ble.sh