Smart case search - Githubissues

fenuks commented 3 years ago

Hello, vim has very handy option smartcase to do case-sensitive search if there is any capital letter in search query, otherwise search is case-insensitive. I have directory called V, and if I type z V I am moved, e.g. into ~/.config/nvim directory. I have to use longer z parent/V to give z a hint. Therefore, I think it makes sense for z to support notion of smart case as well, it would make tool more comfortable to use.

ajeetdsouza commented 2 years ago

One question here is -- should z doc Foo match ~/Documents/Foo? I'd think not -- either the whole query should be smartcase or none of it should be.

I also think normalizing diacritics (café -> cafe) would be great. Possible queries should be:

Accent normalization:

z cafe matches ~/Pictures/Café
z café does not match ~/Pictures/Cafe

Case normalization:

z cafe matches ~/Pictures/Cafe
z Cafe does not match ~/Pictures/cafe

fenuks commented 2 years ago

One question here is -- should z doc Foo match ~/Documents/Foo? I'd think not -- either the whole query should be smartcase or none of it should be.

I agree, it should be all or nothing. Or perhaps there should be two types of switches, one that applies case-sensitive search globally if there is at least one with capital letter in search query, and other that infers smart case for each word of query individually.

I also think accent normalization (café -> cafe) would be great. Possible queries should be:

Accent normalization:
* `z cafe` matches `~/Pictures/Café`

* `z café` does not match `~/Pictures/Cafe`
Case normalization:
* `z cafe` matches `~/Pictures/Cafe`

* `z Cafe` does not match `~/Pictures/cafe`

That would be great as well to have, I happen to speak in language with diacritics, but if I were to choose one of the two only, then it would be smart case. ;)

kidonng commented 2 years ago

Somewhat related: #114

As for me, I would like zoxide to match case-insensitively all the time, with an option to enable case-sensitive matching. Smart case seems overkill.

ajeetdsouza commented 2 years ago

@kidonng I'm curious as to why you'd want to disable smartcase matching. I wouldn't expect anyone to use uppercase in a query unless they were hoping for results with the same uppercase letters in them.

PurpleMyst commented 2 years ago

I'm willing to work on this, if that's ok! I'll be trying to send in a PR by tomorrow.

ajeetdsouza commented 2 years ago

@PurpleMyst there's already a pending PR on improving search which will very likely conflict with this. I haven't really had time to look into it yet, but for now, I'd recommend against creating a separate PR.

lefth commented 2 years ago

@kidonng I'm curious as to why you'd want to disable smartcase matching. I wouldn't expect anyone to use uppercase in a query unless they were hoping for results with the same uppercase letters in them.

I seldom disable smart case in any program, but I sometimes want true case-insensitive when I search for copied text that contains capitals. For example if I want to search for "armv7_unknown" but copied text from: CARGOTARGETARMV7_UNKNOWN_LINUX_MUSLEABIHF_LINKER=arm-linux-musleabihf-gcc.

And case sensitivity (no smart case) is useful when there's a great pollution of upper case strings but you want to find a lower case string.

I suggest the option to disable smart case can be deferred until someone insistently asks for it, since (though it's important) it's so uncommonly used. It may never be an issue in zoxide since zoxide's use case is partly to avoid copying long strings. And because zoxide works on paths rather than codebases, there will be less collision between different cases. Paths don't have the same case convention issues as code (different kinds of tokens having different capitalization).

lefth commented 2 years ago

BTW, I just implemented smart case matching in my branch. If you want to try my version with smart case and new keyword-based scoring (I think these features will make it into the official version at some point), you can install it with: cargo install zoxide --git https://github.com/lefth/zoxide

dedebenui commented 5 months ago

Somewhat related to that, a lot of characters with diacritics have several representations, for example é can be U+0065 U+0301 or simply U+00E9. It would be nice if zoxide could merge those, perhaps by running everything (both queries and file/dir lists) through some normalization transformation. Right now, at least on macOS, if my folder is named café (U+0065 U+0301, the preferred encoding when renaming things in Finder) and I z café (U+00E9, what actually gets typed in my shell), I get no match.

etiennepellegrini commented 2 months ago

Was progress made on this, or on merging @lefth's branch into the project? I would really enjoy having the smartcase ability (less interested in diacritics for now)

lefth commented 2 months ago

@etiennepellegrini I haven't pushed to integrate my change, largely because I don't have a solid framework for deciding which matches are best.

We could start that effort and gradually improve it by adding a test that lists a lot of filenames and several input strings and confirms that each best match is correct. But there are a lot of arbitrary decisions to be made. Is "Key" input string a better match for "Keystore" or "key" (or .keyStore for that matter)? Is a full word match, full string match, or exact case match considered best?

How do other flexible database search engines rank matches? Is there a term of art that describes this ranking so I can search for more info?

etiennepellegrini commented 2 months ago

Those are good questions -- I think your idea of figuring out a list of edge cases is a good step. Many (hopefully most?) of these decisions may have been decided before, on other projects where smartcase is used.

I don't know enough about the internal workings of zoxide to really help, but here's the way I think about zoxide and how I'd answer the questions you're asking:

the input string is used to filter the database
the return value is the database entry with the highest score (so the "quality" of the match isn't really determined by the input string)
then, if input string is capitalized, only match directories with the exact same case (I picture it as if the database was a text file with all directories sorted according to their score, and I was using vim to search for the input key, returning the first match)

So in this scenario, if you have a .keyStore, .keystore, and .Keystore directories:

z Key would only match with Keystore
z key would match all three and return the highest rated one
z kEy doesn't match anything

lefth commented 2 months ago

That is reasonable, especially as a first step. But the engineer in me says at a minimum, the logic should be able to distinguish among directories named "Audio", "Audiobooks", and "Books". (Also, if that doesn't work it's not even as capable as the proof of concept version in my branch.) So maybe we can start with a small test and few heuristics and add more later if there's an issue.

ajeetdsouza / zoxide

Smart case search #224