Smart-case tab completion has become less smart

zx8 commented 3 years ago

Using https://github.com/fish-shell/fish-shell/commit/9c413b039d572963dd5cf9a673ec47927d584df2.

It's unclear if this is intentional, but it seems like a regression in smart-case tab completion:

Repro

$ mkdir dog Duck Dodo
$ cd do<tab>

Expected

Both dog and Dodo should be offered.

Actual

dog is automatically completed - Dodo is not offered.

Rationale

When using lowercase and tab-completing, it's because I can't remember which directories are lowercase and which are uppercase. When I use uppercase (i.e. cd Do<tab>), it's because I know a particular directory starts with an uppercase letter and am going out of my way to enter an uppercase character, so in this case I would expect only Dodo to be offered.

This would be consistent with how smart-case works with all other CLI tools I use, so I expect this is a regression (since it didn't used to work this way in fish either).

zanchey commented 3 years ago

I don't know about a regression - this is how 3.0.2 and 3.1.2 behave as well. complete -C shows the right result, but something else removes Dodo as a valid completion when using Tab.

zanchey commented 3 years ago

krobelus commented 3 years ago

In the interactive reader only the best completions are shown. The rankings are described in https://github.com/fish-shell/fish-shell/issues/568#issuecomment-18455181. The match type (exact/prefix/substring/subsequence) dominates the case differences.

krackers commented 2 years ago

Note that even if the above is fixed (either by ranking smart-case and case-sensitive the same or by changing the reader to permit ranks that differ by case only), I think there'd still be an issue where the reader rejects smartcase completions because they require replacement

If any commands of the best rank do not require replacement, then ignore all those that want to use replacement.

@krobelus Do you know why there exists a distinction between completions that replace the token and ones that don't? From a user perspective why not just treat all of them as replacing token (which would be a visual no-op if the prefix already matches?) Is this just a performance optimization?

krobelus commented 2 years ago

Do you know why there exists a distinction between completions that replace the token and ones that don't?

Right, the distinction only exists internally and does not show up in complete -C output. We should probably try to always use the COMPLETE_REPLACES_TOKEN behavior.

A previous attempt that did something similar was b38a23a46 (Attempt to simplify how completions get presented in the pager, 2020-11-28) but reverted in b9b84e63b (Revert "Attempt to simplify how completions get presented in the pager", 2020-12-02)

To avoid similar issues we should often

try to avoid modifying what the user typed
still escape whatever we add

This might be tricky to reconcile with our existing expand.cpp code.

For example, on

touch file\ with\ spaces
echo file\ with|

completion should give

echo file\ with\ spaces

krackers commented 2 years ago

@krobelus

Hm what if we modify b38a23a by doing something like

   // Ensure that all surviving completions replace their token, so we can handle them uniformly.
    for (completion_t &c : surviving_completions) {
        if (!(c.flags & COMPLETE_REPLACES_TOKEN)) {
            c.flags |= COMPLETE_REPLACES_TOKEN;
            // If current user input includes characters that should not be escaped
            if (escape(tok) != tok) {
                c.flags |= COMPLETE_DONT_ESCAPE;
               c.completion = tok + escape(c.completion,);
            }
            else {
                c.completion.insert(0, tok);
            }
        }
    }

This satisfies both the criteria you mentioned.

mike-lloyd03 commented 1 year ago

I'm migrating to fish from zsh. This thread is a bit old. Has this issue been addressed in elsewhere?

ridiculousfish commented 1 year ago

Doesn't look like it.

krackers commented 1 year ago

Fwiw I did implement the technique in https://github.com/fish-shell/fish-shell/issues/7944#issuecomment-1059597435 and I've been using it since. It's based off the technique in https://github.com/fish-shell/fish-shell/commit/b38a23a46dfac9fdaf453cced61683765790528c but modified to fix the quoting issues and it seemed to work OK for my needs, I recall testing it pretty with whatever weird things I could think of.

But I don't even remember the high-level logic anymore, and the control flow should probably be cleaned up

diff --git a/src/reader.cpp b/src/reader.cpp
index dc88fe4b564..5c705be5bd2 100644
--- a/src/reader.cpp
+++ b/src/reader.cpp
@@ -82,10 +82,6 @@
 // interactive command to complete.
 #define ENV_CMD_DURATION L"CMD_DURATION"

-/// Maximum length of prefix string when printing completion list. Longer prefixes will be
-/// ellipsized.
-#define PREFIX_MAX_LEN 9
-
 /// A simple prompt for reading shell commands that does not rely on fish specific commands, meaning
 /// it will work even if fish is not installed. This is used by read_i.
 #define DEFAULT_PROMPT L"echo -n \"$USER@$hostname $PWD \"'> '"
@@ -1543,12 +1539,14 @@ wcstring completion_apply_to_command_line(const wcstring &val, complete_flags_t
     bool add_space = !bool(flags & COMPLETE_NO_SPACE);
     bool do_replace = bool(flags & COMPLETE_REPLACES_TOKEN);
     bool do_escape = !bool(flags & COMPLETE_DONT_ESCAPE);
+    bool escape_after_token = bool(flags & COMPLETE_ESCAPE_AFTER_TOKEN);
     bool no_tilde = bool(flags & COMPLETE_DONT_ESCAPE_TILDES);

     const size_t cursor_pos = *inout_cursor_pos;
     bool back_into_trailing_quote = false;
     bool have_space_after_token = command_line[cursor_pos] == L' ';

+  wchar_t quote = L'\0';
     if (do_replace) {
         size_t move_cursor;
         const wchar_t *begin, *end;
@@ -1560,8 +1558,39 @@ wcstring completion_apply_to_command_line(const wcstring &val, complete_flags_t
         wcstring sb(buff, begin - buff);

         if (do_escape) {
-            wcstring escaped = escape_string(
-                val, ESCAPE_ALL | ESCAPE_NO_QUOTED | (no_tilde ? ESCAPE_NO_TILDE : 0));
+            wcstring escaped;
+            if (escape_after_token) {
+                // Note that because we only ever set ESCAPE_AFTER_TOKEN after prepending, 
+                // we are guaranteed that tok will be a prefix of val.
+                // The only case in which it's not a strict prefix is 
+                // when we've inserted a prefix under unclosed quote
+                // E.g. Tok: "/Users/foo/Bar/Baz"
+                //      Val: "/Users/foo/Bar/"Baz
+                // But since this just transpoes the quote the logic still works.
+                wcstring tok =  command_line.substr(begin - buff, (end - begin));
+                assert(val.size() >= tok.size());
+                wcstring suffix = wcstring{};
+                size_t edge_trim = 0;
+                if (quote == L'\0' && !append_only && cursor_pos > 0) {
+                    // The entire token is reported as unquoted...see if the last character is an
+                    // unescaped quote.
+                    wchar_t trailing_quote = unescaped_quote(command_line, cursor_pos - 1);
+                    if (trailing_quote != L'\0' && tok.size() > 1) {
+                        quote = trailing_quote;
+                        tok.pop_back();
+                        suffix = wcstring{quote};
+                        edge_trim = 1;
+                    }
+                }
+                //printf("\n\nVal trimmed %ls\n\n", val.substr(tok.size() + edge_trim).c_str());
+                //printf("Has quote %ld\n", edge_trim);
+                //printf("Tilde? %d\n", no_tilde);
+                escaped = tok + parse_util_escape_string_with_quote(val.substr(tok.size() + edge_trim), quote, no_tilde) + suffix;
+                //printf("Escpaed %ls\n", escaped.c_str());
+            }
+            else {
+                escaped = escape_string(val, ESCAPE_ALL | ESCAPE_NO_QUOTED | (no_tilde ? ESCAPE_NO_TILDE : 0));
+            }
             sb.append(escaped);
             move_cursor = escaped.size();
         } else {
@@ -1580,7 +1609,6 @@ wcstring completion_apply_to_command_line(const wcstring &val, complete_flags_t
         return sb;
     }

-    wchar_t quote = L'\0';
     wcstring replaced;
     if (do_escape) {
         // We need to figure out whether the token we complete has unclosed quotes. Since the token
@@ -1884,6 +1912,45 @@ static uint32_t get_best_rank(const completion_list_t &comp) {
     return best_rank;
 }

+/// \return the common string prefix of a list of completions.
+static wcstring extract_common_prefix(const completion_list_t &completions) {
+    bool has_seed = false;
+    wcstring result;
+    // Seed it with the first samecase completion (if any), so that the prefix has the same case as
+    // the command line.
+    for (const completion_t &c : completions) {
+        if (c.match.is_samecase()) {
+            result = c.completion;
+            has_seed = true;
+            break;
+        }
+    }
+
+    for (const completion_t &c : completions) {
+        if (!has_seed) {
+            result = c.completion;
+            has_seed = true;
+            continue;
+        }
+
+        // Allow case insensitive common prefix if our completion was not samecase.
+        bool icase = !c.match.is_samecase();
+        size_t i = 0;
+        size_t max = std::min(c.completion.size(), result.size());
+        for (; i < max; i++) {
+            wchar_t c1 = c.completion[i];
+            wchar_t c2 = result[i];
+            bool chars_match = (c1 == c2 || (icase && towlower(c1) == towlower(c2)));
+            if (!chars_match) {
+                break;
+            }
+        }
+        assert(i <= result.size() && "Shared prefix should not make string longer");
+        result.resize(i);
+    }
+    return result;
+}
+
 /// Handle the list of completions. This means the following:
 ///
 /// - If the list is empty, flash the terminal.
@@ -1905,7 +1972,7 @@ bool reader_data_t::handle_completions(const completion_list_t &comp, size_t tok
     bool success = false;
     const editable_line_t *el = &command_line;

-    const wcstring tok(el->text().c_str() + token_begin, token_end - token_begin);
+    wcstring tok(el->text(), token_begin, token_end - token_begin);

     // Check trivial cases.
     size_t size = comp.size();
@@ -1930,127 +1997,63 @@ bool reader_data_t::handle_completions(const completion_list_t &comp, size_t tok
         return success;
     }

-    auto best_rank = get_best_rank(comp);
-
-    // Determine whether we are going to replace the token or not. If any commands of the best
-    // rank do not require replacement, then ignore all those that want to use replacement.
-    bool will_replace_token = true;
-    for (const completion_t &el : comp) {
-        if (el.rank() <= best_rank && !(el.flags & COMPLETE_REPLACES_TOKEN)) {
-            will_replace_token = false;
-            break;
-        }
-    }

     // Decide which completions survived. There may be a lot of them; it would be nice if we could
     // figure out how to avoid copying them here.
+    auto best_rank = get_best_rank(comp);
     completion_list_t surviving_completions;
     bool all_matches_exact_or_prefix = true;
-    for (const completion_t &el : comp) {
-        // Ignore completions with a less suitable match rank than the best.
-        if (el.rank() > best_rank) continue;
-
-        // Only use completions that match replace_token.
-        bool completion_replace_token = static_cast<bool>(el.flags & COMPLETE_REPLACES_TOKEN);
-        if (completion_replace_token != will_replace_token) continue;
-
-        // Don't use completions that want to replace, if we cannot replace them.
-        if (completion_replace_token && !reader_can_replace(tok, el.flags)) continue;
-
-        // This completion survived.
-        surviving_completions.push_back(el);
-        all_matches_exact_or_prefix = all_matches_exact_or_prefix && el.match.is_exact_or_prefix();
-    }
-
-    if (surviving_completions.size() == 1) {
-        // After sorting and stuff only one completion is left, use it.
-        //
-        // TODO: This happens when smartcase kicks in, e.g.
-        // the token is "cma" and the options are "cmake/" and "CMakeLists.txt"
-        // it would be nice if we could figure
-        // out how to use it more.
-        const completion_t &c = surviving_completions.at(0);
-
-        // If this is a replacement completion, check that we know how to replace it, e.g. that
-        // the token doesn't contain evil operators like {}.
-        if (!(c.flags & COMPLETE_REPLACES_TOKEN) || reader_can_replace(tok, c.flags)) {
-            completion_insert(c.completion, token_end, c.flags);
-        }
-        return true;
-    }
-
-    bool use_prefix = false;
-    wcstring common_prefix;
-    if (all_matches_exact_or_prefix) {
-        // Try to find a common prefix to insert among the surviving completions.
-        complete_flags_t flags = 0;
-        bool prefix_is_partial_completion = false;
-        bool first = true;
-        for (const completion_t &el : surviving_completions) {
-            if (first) {
-                // First entry, use the whole string.
-                common_prefix = el.completion;
-                flags = el.flags;
-                first = false;
-            } else {
-                // Determine the shared prefix length.
-                size_t idx, max = std::min(common_prefix.size(), el.completion.size());
-
-                for (idx = 0; idx < max; idx++) {
-                    if (common_prefix.at(idx) != el.completion.at(idx)) break;
-                }
-
-                // idx is now the length of the new common prefix.
-                common_prefix.resize(idx);
-                prefix_is_partial_completion = true;
-
-                // Early out if we decide there's no common prefix.
-                if (idx == 0) break;
+    for (const completion_t &c : comp) {
+        if (c.rank() > best_rank && !(c.flags & COMPLETE_DISPLAY_ALL)) continue;
+
+        bool completion_replace_token = (c.flags & COMPLETE_REPLACES_TOKEN);
+        if (completion_replace_token && !reader_can_replace(tok, c.flags)) continue;
+        surviving_completions.push_back(c);
+        //printf("\n READER %ls %ls\n", c.completion.c_str(), tok.c_str());
+        all_matches_exact_or_prefix = all_matches_exact_or_prefix && c.match.is_exact_or_prefix();
+    }
+
+    // Ensure that all surviving completions replace their token, so we can handle them uniformly.
+    //  wchar_t trailing_quote = unescaped_quote(el->text(), token_end - 1);
+    //  printf("Trailing quote? %lc\n", trailing_quote);
+    // if (trailing_quote != L'\0') {
+    //     tok.pop_back();
+    // }
+    // printf("Tok %ls\n", tok.c_str());
+    // wcstring suffix = trailing_quote ? wcstring{trailing_quote} : wcstring{};
+
+    for (completion_t &c : surviving_completions) {
+        if (!(c.flags & COMPLETE_REPLACES_TOKEN)) {
+            //printf("Appending\n");
+            c.flags |= COMPLETE_REPLACES_TOKEN;
+            c.completion = /*unescaped*/tok + c.completion /*+ suffix*/;
+            if (!(c.flags & COMPLETE_DONT_ESCAPE)) {
+              c.flags |= COMPLETE_ESCAPE_AFTER_TOKEN;
             }
         }
-
-        // Determine if we use the prefix. We use it if it's non-empty and it will actually make
-        // the command line longer. It may make the command line longer by virtue of not using
-        // REPLACE_TOKEN (so it always appends to the command line), or by virtue of replacing
-        // the token but being longer than it.
-        use_prefix = common_prefix.size() > (will_replace_token ? tok.size() : 0);
-        assert(!use_prefix || !common_prefix.empty());
-
-        if (use_prefix) {
-            // We got something. If more than one completion contributed, then it means we have
-            // a prefix; don't insert a space after it.
-            if (prefix_is_partial_completion) flags |= COMPLETE_NO_SPACE;
-            completion_insert(common_prefix, token_end, flags);
-            cycle_command_line = command_line.text();
-            cycle_cursor_pos = command_line.position();
-        }
     }

-    if (use_prefix) {
-        for (completion_t &c : surviving_completions) {
-            c.flags &= ~COMPLETE_REPLACES_TOKEN;
-            c.completion.erase(0, common_prefix.size());
-        }
-    }
+    // Compute the common prefix (perhaps empty) of all surviving completions, and replace our token
+    // with it if it would make the token longer.
+    // TODO: Might fail when you have mixed escaped & unescaped things. But the old behavior also
+    // used the escape behavior of first elem so it's probably fine?
+    wcstring common_prefix = extract_common_prefix(surviving_completions);
+    if (common_prefix.size() > tok.size()) {
+        complete_flags_t flags = (*surviving_completions.begin()).flags;
+        // Replace the token! Note this invalidates token_begin and token_end.
+        // Do not insert a space if more than one completion contributed.
+        //printf("inserting common prefix %ls\n", common_prefix.c_str());
+        if (surviving_completions.size() > 1) flags |= COMPLETE_NO_SPACE;
+        completion_insert(common_prefix, token_end, flags);

-    // Print the completion list.
-    wcstring prefix;
-    if (will_replace_token || !all_matches_exact_or_prefix) {
-        if (use_prefix) prefix = std::move(common_prefix);
-    } else if (tok.size() + common_prefix.size() <= PREFIX_MAX_LEN) {
-        prefix = tok + common_prefix;
-    } else {
-        // Append just the end of the string.
-        prefix = wcstring{get_ellipsis_char()};
-        prefix.append(tok + common_prefix, tok.size() + common_prefix.size() - PREFIX_MAX_LEN,
-                      PREFIX_MAX_LEN);
+        cycle_command_line = command_line.text();
+        cycle_cursor_pos = command_line.position();
     }

     // Update the pager data.
-    pager.set_prefix(prefix);
-    pager.set_completions(surviving_completions);
+    pager.set_completions(surviving_completions, common_prefix);
     // Invalidate our rendering.
-    current_page_rendering = page_rendering_t();
+    current_page_rendering = page_rendering_t{};
     // Modify the command line to reflect the new pager.
     pager_selection_changed();
     return false;

archelaus commented 1 year ago

I'm surprised there ain't much activity here. I've been noticing this issue since quite a while now, and frankly it's annoying. Like the OP mentioned it's hard to remember off the top of my head whether or not a particular directory starts with a uppercase letter, so IMO tab completion should suggest case-insensitive options as well. Is this something that's being worked on, or has been fixed but I missed it somehow? I <3 Fish but this particular choice seems strange, even Bash doesn't seem to have this issue.

TonyWael commented 1 year ago

I'm surprised there ain't much activity here. I've been noticing this issue since quite a while now, and frankly it's annoying. Like the OP mentioned it's hard to remember off the top of my head whether or not a particular directory starts with a uppercase letter, so IMO tab completion should suggest case-insensitive options as well. Is this something that's being worked on, or has been fixed but I missed it somehow? I <3 Fish but this particular choice seems strange, even Bash doesn't seem to have this issue.

I have to be very honest with you and I totally feel the same. see this behaviour also:

zx8 commented 12 months ago

@razzius @krackers I see you have forks with a fix for this issue. Are you planning to submit it as a PR, or shall we close this as something that won't be implemented?

razzius commented 11 months ago

@zx8 I'd like to submit my patch eventually, however my code loses the prefix underlining/bolding in the tab completion overlay, so effectively it's substituting a functionality bug for a visual bug. To see what I'm talking about, here's screenshots

without my patch:

with my patch:

The tricky part is that if completion options have different capitalization, different strings (same letters but different capitalization) will render as underlined in the completion list, so we can't use the pager.set_prefix function as-is.

As a concrete example, this repository contains multiple files that start with co/CO in its root: config_cmake.h.in and CODE_OF_CONDUCT.md (and others). Ideally a user in the root directory typing co<tab> would be presented with a pager including options of both case, with the matching co/CO underlined in all cases, looking something like (I drew the underlines manually in this mockup):

Screenshot from 2023-12-13 03-04-29

Whereas currently it completes config_cmake.h.in since it case-sensitively (rather than smart case) matches the case of co.

Hopefully this context helps anybody interested in working on this feature and illustrates why it's not so easy. @krackers whose patch above I based mine off of has the same issue as it also removes the pager.set_prefix(prefix); line.

krackers commented 11 months ago

The loss of underline is only a minor visual issue, the more major one with actual functionality loss is that it removes the code to automatically insert the common prefix

The version of my patch based on ridiculousfish's approach actually does preserve both the shared prefix insertion as well as the underline. I tested on your example and it produces the results you expect in your screenshot. Note that common_prefix is passed as an arg to set_completions which is updated according to the logic in ridiculousfish's patch. His logic to determine this common_prefix was also updated to take into account smartcase behavior, see https://github.com/krackers/fish-shell/blob/3.3.1/src/reader.cpp#L1929

Ridiculousfish's original patch did not properly handle escapes though, which is what my attempt tries to hackily fix. The most robust method I could think of was to still use the approach of uniformly using complete_replaces_token (by concating the suggestion to the current token if the replace bit was not set) for populating suggestions, but when inserting suggestions we reconstruct the original suffix if needed from the concatenatd version and only escape that suffix. This feels really hacky though, and there was already one edge-case I had to fix (when you insert a prefix, the current token changes, so you have to explicitly record the beginning of the suffix instead of implicitly getting it by chopping off the current token).

I briefly looked through your patch razzius, and the line c.completion = tok + c.completion; seems a bit suspect to me because it means that when inserting tok might get escaped when it shouldn't be, I think the same issue that ridiculousfish's original patch ran into. I'm not 100% sure though...

limaceous-bushwhacker commented 7 months ago

The loss of underline is only a minor visual issue, the more major one with actual functionality loss is that it removes the code to automatically insert the common prefix

The version of my patch based on ridiculousfish's approach actually does preserve both the shared prefix insertion as well as the underline. I tested on your example and it produces the results you expect in your screenshot. Note that common_prefix is passed as an arg to set_completions which is updated according to the logic in ridiculousfish's patch. His logic to determine this common_prefix was also updated to take into account smartcase behavior, see https://github.com/krackers/fish-shell/blob/3.3.1/src/reader.cpp#L1929

Ridiculousfish's original patch did not properly handle escapes though, which is what my attempt tries to hackily fix. The most robust method I could think of was to still use the approach of uniformly using complete_replaces_token (by concating the suggestion to the current token if the replace bit was not set) for populating suggestions, but when inserting suggestions we reconstruct the original suffix if needed from the concatenatd version and only escape that suffix. This feels really hacky though, and there was already one edge-case I had to fix (when you insert a prefix, the current token changes, so you have to explicitly record the beginning of the suffix instead of implicitly getting it by chopping off the current token).

I briefly looked through your patch razzius, and the line c.completion = tok + c.completion; seems a bit suspect to me because it means that when inserting tok might get escaped when it shouldn't be, I think the same issue that ridiculousfish's original patch ran into. I'm not 100% sure though...

Hey, is there a (hopefully, beginner friendly) way to use your patch? Doesn't look there's been much progress in here officially.

krackers commented 7 months ago

(hopefully, beginner friendly) way to use your patch

No, unfortunately not, since it's now several versions out of date. The version linked above is for 3.3.1, you would have to try rebasing it on top of the last C++ branch.

fish-shell / fish-shell