Open ghost opened 6 years ago
counsel-file-jump
(not counsel-jump-file
) is currently synchronous, but it would be nice to make it asynchronous, indeed.
I agree that making counsel-file-jump
async would be a good improvement.
In the meantime, I suggest counsel-fzf
as a better-performing alternative. One potential downside is if you don't like fuzzy matching, fzf's matcher is similar to ivy--regex-fuzzy
.
Even so, doing counsel-fzf
on the home folder isn't a good idea: since there's no index the whole file tree will be checked, and that takes too long. But then we have counsel-locate
which relies on a system with an index.
Just finished running time find ~
on my non-SSD laptop; it took 9m29s. counsel-locate
is just so much better at looking for files in the home dir that it might be a good idea to use it automatically if the home directory is detected.
Thanks, @abo-abo . counsel-fzf
is nice. How is it that counsel-fzf
is asynchronous but the counsel-file-jump
is not? counsel-file-jump uses find
and counsel-fzf uses fzf -f
.
Does Is it have something to do with fzf
itself, like it sends fixed size buffer to read or something? Just curious to know.
Overall I think doing counsel-git
in git directories and counsel-fzf
in non-git directories seems to be a good solution. I'll keep counsel-locate
in mind too. Usually, I don't search in ~
!
allowing to pass and narrow candidates asynchronously (much like in plain fzf, e.g. it doesn't wait til command that produces candidates is finished) would be nice improvement.
The problem is that find utility does not accept regular expressions, which one would expect with search as one types behavior of this package. It rather uses some globbing syntax for which we would need a crippled translator.
That's the reason why we first consume all output of find utility and only then filter it with Emacs regular expressions.
The problem is that find utility does not accept regular expressions, which one would expect with search as one types behavior of this package.
I don't see how regular expressions related to fuzzy matching and asynchronous candidate narrowing
That's the reason why we first consume all output of find utility and only then filter it with Emacs regular expressions.
fzf
works with find
via pipe, and it doesn't need to wait for full list of candidates to narrow it or do fuzzy matching
Exactly, hence, you are essentially asking for reimplementation of FZF in Emacs Lisp. The profit is questionable.
Exactly, hence, you are essentially asking for reimplementation of FZF in Emacs Lisp. The profit is questionable.
No, I don't. We already have fuzzy matching in Ivy. It just doesn't accept candidates over time, and waits for full list of candidates.
Emacs is single-threaded, how will you achieve that?
Emacs has builtin eshell
written in elisp, which doesn't block Emacs while running find
, so I guess there are ways to start process and take its output periodically.
Sure, but in this case you would search for everything and then filter in Emacs. It means you will potentially blow up memory consumption, suffer from GC, and this will be slow anyway as all of that, i.e. reading output portion in some time window and then filtering it will be sequential (no threads). Furthermore, if you change input, we have to start filtering from scratch that huge collection of candidates. This will not scale. A better approach is to do what I mentioned earlier. We need to retrigger find utility invocation as you type, on each character. FZF async integration works the same way. The problem now is the filtering syntax of find utility. The solution is to either implement translator or take a bold move and simply pipe it through grep utility first, which is easy to apply filtering for, and only plug the resulting filtered output to Emacs then. The advantages of ease of implementation and performance are obvious.
Sure, but in this case you would search for everything and then filter in Emacs.
And that's exactly how counsel-file-jump
works. But it waits for the full list of candidates, instead of accepting them over time
And your initial proposal will not improve it much, see above.
And your initial proposal will not improve it much, see above.
Oh it would be a great time-saver improvement. Instead of waiting for 24k candidates for 8 seconds every time when invoking this command, I may obtain first 3k in 1 second and file I'm looking for may be already listed within those, so I open it before full list is built and filtered. And I will input my fuzzy string in that first second as well. Currently I need to wait for 8 seconds plus 1 second when I actually type what I want. Even if my file will be at the very end of the candidate list in the worst perspective it will take the same time as in current implementation minus one second when I've typed.
I'd rather sacrifice machine resources in order to get my file faster, instead of waiting every time for no reason, because the file I'm looking for might be in first hundred of candidates, but I need to wait for all 24k candidates for no reason.
Did you read through my comparison between what you propose and what I propose? I'm not saying that the current implementation is good or optimal. Forget about it, what I'm saying is that your proposal will not fly much further. Go ahead and implement it if you are interested. That implementation will be unnecessary convoluted and underperform compared to external tools doing all of the heavy lifting. Replacing counsel-{file,directory}-jump
implementation with asynchronous find ... | grep ...
will do the trick.
To kill it further, I could replace grep
with rg
for example, and as a result, benefit from parallelized filtering, which you will never get in Emacs Lisp.
If you're proposing counsel-file-jump
to work by starting new process on every keystroke then it is the absolute worst way of doing fuzzy matching over list of files. Especially when we already see that counsel-rg
isn't really performant because of that. Forking processes on each keystroke will make things even worth on slow CPU and memory access, because time to shut application down is significant. No matter how fast you're filtering with rg
if it is slow to start and finish. And I can experience such problems with some tools like running rg
in skim -i -c
which essentially runs new rg
on every keypress, and it's pretty fast working tool, written in pretty fast language, compared to elisp.
Let's put it simple, are you happy with counsel-fzf
?
no, I'm using counsel-file-jump
because it actually works faster. Basically since counsel-fzf
recalculates whole list on every keypress, so in case you written fuzzy string that doesn't match you need to delete it you wait whole time again for all relevant candidates.
On slow CPU and memory access your approach will be even slower. The point is to isolate as much resource allocation outside Emacs as possible because it's not doing well with respect to combining user interactivity together with heavy computational/memory load. Termination of child can be done asynchronously, this is not really an argument here. I don't know what kind of system/hardware you're are running, but I'd say arguments about forking a process on Unix on every keystroke from human interactivity perspective sound ridiculous in present decade.
Basically since
counsel-fzf
recalculates whole list on every keypress, so in case you written fuzzy string that doesn't match you need to delete it you wait whole time again for all relevant candidates.
How does your Emacs Lisp candidate filtering alleviate this issue? When you change the input pattern, somebody has to refilter the whole original input collection and rebuild the filtered collection from scratch to display it. The difference will be that with your proposal, Emacs will be choking doing all that heavy filtering in one thread and on top of the large original input collection (e.g. when you invoke find
from /
) on each keystroke (Yes, that huge input collection had to be read only once by Emacs, so what? That's not the only bottleneck here and since that input collection is still of the same huge size as in the current implementation, it does not improve much), while in my approach, Emacs will only consume relatively small portion of already filtered (potentially in parallel) output asynchronously in order to only display it to the user.
The point is to isolate as much resource allocation outside Emacs as possible because it's not doing well with respect to combining user interactivity together with heavy computational/memory load.
Then we need a proper tool and to call it in a place where we do not have to worry about managing allocations and do heavy computations and managing the display of the results. Oh wait, it means we just need to run fzf from the terminal. Ivy is extremely performant at showing and filtering really big amount of files once the list of candidates is complete. I'm talking about filtering the worst case (the last file in the list, with the match of 84 letters fuzzy string pasted from clipboard) of 90k files. It happens immediately. The only problem here that making the list of 90k files takes more time then filtering time, and that it seems that ivy doesn't do incremental matching.
How does your Emacs Lisp candidate filtering alleviate this issue? When you change the input pattern, somebody has to refilter the whole original input collection and rebuild the filtered collection from scratch to display it.
Incremental filtering gets faster with each keystroke since less variants are filtered. Caching previous results helps when deleting. In contrast when we invoking new process on each keystroke we build everything from scratch, because the tools we're using are not implemented with incremental usage in mind. Of course when we insert absolutely new pattern we have to do all computing from the ground up in both variants, but incremental filtering is much faster still.
Since my papers are not yet signed by my employer, I will simply leave the code here. To prove my point, the following is a blazing fast replacement of counsel-*-jump
functions:
(require 'counsel)
(defun counsel--projectile-project-root ()
"Return root of current project or nil on failure.
Use `projectile-project-root' to determine the root."
(and (fboundp 'projectile-project-p)
(fboundp 'projectile-project-root)
(projectile-project-p)
(projectile-project-root)))
(defvar counsel-filter-file-root-functions
'(counsel--projectile-project-root
counsel--project-current
counsel--git-root
counsel--configure-root
counsel--dir-locals-root)
"Special hook to find the project root for `counsel-filter-file'.
Each function on this hook is called in turn with no arguments
and should return either a directory, or nil if no root was
found.")
(defun counsel-filter-file-root ()
"Return root of current project or `default-directory'.
The root is determined by `counsel-filter-file-root-functions'."
(or (run-hook-with-args-until-success 'counsel-filter-file-root-functions)
default-directory))
(defcustom counsel-filter-file-dir-function #'counsel-filter-file-root
"Function that returns a directory for `counsel-filter-file'."
:type 'function)
(defcustom counsel-filter-file-find-args "-name '.git' -prune -o -type f -print"
"Arguments for the `find-command' for `counsel-filter-file'."
:type 'string)
(defcustom counsel-filter-file-grep-args "-E -e %s"
"Arguments for the `grep-command' for `counsel-filter-file'."
:type 'string)
(defvar counsel-filter-file-command nil
"Command for `counsel-filter-file'.")
(defvar counsel-filter-file-dir nil
"Directory for `counsel-filter-file'.")
(defun counsel-filter-file-function (string)
(or (ivy-more-chars)
(let ((default-directory counsel-filter-file-dir)
(regex (replace-regexp-in-string "\n" "" (counsel--grep-regex string))))
(counsel--async-command (format counsel-filter-file-command
(shell-quote-argument regex)))
nil)))
(defun counsel-filter-file-action (f)
(with-ivy-window
(let ((default-directory counsel-filter-file-dir))
(find-file f))))
;;;###autoload
(defun counsel-filter-file (&optional initial-input initial-directory prompt action)
(interactive (list nil
(when current-prefix-arg
(read-directory-name "From directory: "))))
(counsel-require-program find-program)
(counsel-require-program grep-program)
(setq counsel-filter-file-command
(mapconcat #'identity
(list find-program
" . "
counsel-filter-file-find-args
(when (fboundp 'make-process)
(concat " 2> " (shell-quote-argument null-device)))
" | cut -c 3- | "
grep-program
counsel-filter-file-grep-args
(unless (string-match-p "%s"
counsel-filter-file-grep-args)
" %s"))
" "))
(setq counsel-filter-file-dir
(or initial-directory
(if (functionp counsel-filter-file-dir-function)
(funcall counsel-filter-file-dir-function)
default-directory)))
(ivy-read (or prompt "Filter file: ")
#'counsel-filter-file-function
:initial-input initial-input
:dynamic-collection t
:preselect (counsel--preselect-file)
:history 'file-name-history
:keymap counsel-find-file-map
:action (or action #'counsel-filter-file-action)
:unwind #'counsel-delete-process
:caller 'counsel-filter-file))
(defcustom counsel-filter-directory-find-args "-name '.git' -prune -o -type d -print"
"Arguments for the `find-command' for `counsel-filter-directory'."
:type 'string)
(defcustom counsel-filter-directory-grep-args "-E -e %s"
"Arguments for the `grep-command' for `counsel-filter-directory'."
:type 'string)
(defun counsel-filter-directory-action (d)
(with-ivy-window
(let ((default-directory counsel-filter-file-dir))
(dired-jump nil (expand-file-name d)))))
;;;###autoload
(defun counsel-filter-directory (&optional initial-input initial-directory prompt action)
(interactive (list nil
(when current-prefix-arg
(read-directory-name "From directory: "))))
(let ((counsel-filter-file-find-args counsel-filter-directory-find-args)
(counsel-filter-file-grep-args counsel-filter-directory-grep-args))
(counsel-filter-file initial-input
initial-directory
(or prompt "Filter directory: ")
(or action #'counsel-filter-directory-action))))
Please, feel free to test and integrate if you like it.
To prove my point, the following is a blazing fast replacement of counsel-*-jump functions
This code works but it doesn't provide any sort of fuzzy matching, therefore it is completely obliterated by any counsel-*-jump
function, and since there's already counsel-fzf
that does the same thing, while allowing fuzziness I don't see how this code is any better.
This code works but it doesn't provide any sort of fuzzy matching, therefore it is completely obliterated by any
counsel-*-jump
function, and since there's alreadycounsel-fzf
that does the same thing, while allowing fuzziness I don't see how this code is any better.
I've just changed the code to support any regular expression builder, including the fuzzy one, ivy--regex-fuzzy
. Nothing is obliterated by anything. counsel-*-jump
is unacceptable solution in the long run, the above works much better in every respect however. It also has an additional advantage of portability compared to fzf
. Not every box would have fzf
installed. A very common example are restricted corporate environments with remote boxes where one works via tramp
, to which me and many others are exposed a lot during regular working hours.
On a side note, you criticize a lot, but don't implement the solution you are advocating for. Why don't you go ahead and implement it so that we can profile and compare which one performs better on huge collections and which implementation is simpler?
@abo-abo, if you decide to integrate, then my last comment from #1812 again applies. In brief, if we let external tools to do fuzzy matching like it's done for the above counsel-filter-*
functions and is also already done for counsel-*g
functions, then for better relevance of fuzzy-filtered output returned by these external tools, we should sort it in some clever way using whatever scoring algorithm, which we should apply as the last post-processing stage in Emacs itself. Not sure if flx
is well-suited for this or we need to roll our own. Does ivy
offer any facility to perform such sorting on an arbitrary candidate collection?
On a side note, you criticize a lot, but don't implement the solution you are advocating for. Why don't you go ahead and implement it so that we can profile and compare which one performs better on huge collections and which implementation is simpler?
I guess you don't have to be chef to criticize food is valid point.
Originally I just came here, since there was a discussion of a feature that I, as user, is interested in. I'm not elisp programmer, I don't know this language well enough to implement such a thing, which doesn't restrict me in any way from criticizing stuff.
I fully understand that If you criticize someone's solution, offer your solution is absolutely one hundred present valid point too, but as I said, I don't know lisp well enough, and my initial interest was in the ability to accept candidates asynchronously, since it also can be used by other sources like projectile, which was discussed there too: https://github.com/bbatsov/projectile/issues/1307, and since this particular issue is related to that problem, I've decided to add my vote for the feature in the comments.
I've just changed the code to support any regular expression builder, including the fuzzy one, ivy--regex-fuzzy
I'm going to try it out then
Nothing is obliterated by anything. counsel-*-jump is unacceptable solution in the long run, the above works much better in every respect however.
There's one point I'm not sure enough about. I'm using a lot of servers simultaniously and I work with the remote files via different programs quite frequently, which may do not feature remote file access. I'm using sshfs
for that, since it is universal way for all programs to access remote files, and it's pretty fast. For example, if my colleague wants to do something on my PC with me I could fire up his favorite editor for him and provide access to the files natively, as on his particular PC. So sshfs
is a must for my workflow, and I can't abandon it. Wouldn't starting the find
process over sshfs
mounted volume be less performant, compared to the solution like counsel-*-jump
where we retain list of files once? I'm not sure since I do not know well enough the mechanism of sshfs
mounting and how often it re-fetches files, and wouldn't every call for find initiate the fetch and slow everything down?
A very common example are restricted corporate environments with remote boxes where one works via That's by the way is one of the reasons why I use
sshfs
- I can have any program on my PC and work with the server files without installing something on the server, which also has horribly outdated pacakge base.
With respect to sshfs
, that's an interesting question. I guess it will all depend on mounting options/parameters related to caching. It's no different from NAS in this respect. When using transparently mounted file systems to do any type of work, one has to take into account what kind of access patterns are expected and configure mounting options accordingly. Of course there is nothing we can do about it from Emacs, it's a system configuration problem. All one can do is check the manual and profile. I'd be interested in your results.
I'd say arguments about forking a process on Unix on every keystroke from human interactivity perspective sound ridiculous in present decade.
I did not follow the whole thing, but Windows users don't have the same story here.
Yes, they don't. However, this needs to be profiled and proven that your typing experience for input pattern is impaired on Windows by process spawning. Otherwise, all these discussions are speculations.
When using transparently mounted file systems to do any type of work, one has to take into account what kind of access patterns are expected and configure mounting options accordingly.
Some notes about it. I'm using fzf
every day (not counsel-fzf
, but my own script for another editor, called Kakoune) to access files from sshfs
file-system with no issues, but it runs find
only once and filters the whole thing. However, when using grep
or even ripgrep
via skim
over sshfs
file-system I experience serious lag, compared to less noticeable lag (and practically no lag with rg
) when using on local storage.
Could you test performance of FZF_DEFAULT_COMMAND=rg --files --hidden --no-ignore --no-messages --smart-case --glob '!.git/*'
?
Could you test performance of
FZF_DEFAULT_COMMAND=rg --files --hidden --no-ignore --no-messages --smart-case --glob '!.git/*'
?
I'm using fd
instead of find
which respects .gitignore. I'll be able to test it with rg
tomorrow.
Also there's another problem with your filter
function. When I filter file with counsel-file-jump
I need to type only coorg
in order to get config.org
first in the list. When I'm using counsel-fzf
or plain fzf
I need to type coorg
in order to get config.org
first in the list. But since grep does arbitrary matching and doesn't do sorting, coorg
matches config.org
but it is far from beginning, which isn't quite right. And even I input full match config.org
it's still not first candidate:
counsel-filter-file
with full match (with fuzzy it's even lower in the list):
counsel-file-jump
with fuzzy match:
counsel-file-jump
with full match:
counsel-fzf
with fuzzy match (I'm not using it with full matches since there's no reason to):
obviously because there's hell lot of .config/
matches, but fzf handles that perfectly by understanding that I'm searching for file that might be in a directory, and not searching a directory. And there are even more smart matchers out there.
This is exactly the issue that I referred to in the comment above (see the end of #1812 also). rg
is faster than fd
by the way for file traversal (because of parallel iterator). Let's see your results tomorrow.
Could you test performance of
FZF_DEFAULT_COMMAND=rg --files --hidden --no-ignore --no-messages --smart-case --glob '!.git/*'
?
I've did some arbitrary measures like so:
$ fd . --no-ignore --type f --hidden --exclude .git --exclude .svn | wc -l
75382
$ rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob '!.svn/*' | wc -l
100738
$ for i in 1 2 3 4 5; do time fd . --no-ignore --type f --hidden --exclude .git --exclude .svn >/dev/null; done
fd . --no-ignore --type f --hidden --exclude .git --exclude .svn > /dev/null 0.71s user 1.77s system 17% cpu 14.422 total
fd . --no-ignore --type f --hidden --exclude .git --exclude .svn > /dev/null 0.40s user 0.91s system 110% cpu 1.187 total
fd . --no-ignore --type f --hidden --exclude .git --exclude .svn > /dev/null 0.50s user 0.87s system 116% cpu 1.178 total
fd . --no-ignore --type f --hidden --exclude .git --exclude .svn > /dev/null 0.42s user 0.93s system 112% cpu 1.195 total
fd . --no-ignore --type f --hidden --exclude .git --exclude .svn > /dev/null 0.46s user 0.88s system 117% cpu 1.152 total
$ for i in 1 2 3 4 5; do time rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob '!.svn/*' >/dev/null; done
rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob > 1.61s user 3.38s system 21% cpu 23.116 total
rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob > 1.46s user 3.05s system 21% cpu 20.892 total
rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob > 0.92s user 1.77s system 95% cpu 2.825 total
rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob > 0.91s user 1.85s system 81% cpu 3.387 total
rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob > 0.72s user 2.10s system 73% cpu 3.827 total
This is scanning of whole remote /home/user/
directory. fd
is generally faster than rg
. As for counsel-fzf
performance, I feel that it is better with fd
too. Don't know how to measure performance of Emacs command.
When scanning remote project this way, results are similar, exept that rg
was little bit faster, but I guess I need to drop caches somehow before first run to get proper results:
$ fd --hidden --follow --no-ignore --exclude .svn --exclude .git | wc -l
10952
$ rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glob '!.svn/*' | wc -l
9240
$ for i in 1 2 3 4 5; do time fd . --no-ignore --follow --type f --hidden --exclude .git --exclude .svn >/dev/null; done
fd . --no-ignore --follow --type f --hidden --exclude .git --exclude .svn > 0.07s user 0.25s system 35% cpu 0.900 total
fd . --no-ignore --follow --type f --hidden --exclude .git --exclude .svn > 0.03s user 0.15s system 86% cpu 0.199 total
fd . --no-ignore --follow --type f --hidden --exclude .git --exclude .svn > 0.07s user 0.11s system 87% cpu 0.202 total
fd . --no-ignore --follow --type f --hidden --exclude .git --exclude .svn > 0.05s user 0.14s system 88% cpu 0.207 total
fd . --no-ignore --follow --type f --hidden --exclude .git --exclude .svn > 0.07s user 0.11s system 89% cpu 0.207 total
$ for i in 1 2 3 4 5; do time rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glob '!.svn/*' >/dev/null; done
rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glo 0.11s user 0.12s system 88% cpu 0.259 total
rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glo 0.07s user 0.11s system 87% cpu 0.201 total
rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glo 0.06s user 0.11s system 86% cpu 0.204 total
rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glo 0.07s user 0.11s system 87% cpu 0.206 total
rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glo 0.08s user 0.11s system 90% cpu 0.209 total
This time fd
found more files, which I like more, since probably in project are some files that I can miss with rg
.
Also, I believe that rg
doesn't need --smart-case
since we're filtering with fzf
which has it's own params for case
counsel-file-jump
freezes the emacs for a couple of minutes when the directory is~
. Shouldn't this operation be asynchronous (preferred), if not can we have some sort of argument set to this function to limit the results (useful but it should only supplement asynchronous)?I am currently using this code from https://github.com/abo-abo/swiper/issues/1404 to make it interactive so that I don't accidentally hit execute it in
~
directory: