abo-abo / swiper

Ivy - a generic completion frontend for Emacs, Swiper - isearch with an overview, and more. Oh, man!
https://oremacs.com/swiper/
2.28k stars 338 forks source link

counsel-file-jump freezes the emacs when I run in the home directory? #1525

Open ghost opened 6 years ago

ghost commented 6 years ago

counsel-file-jump freezes the emacs for a couple of minutes when the directory is ~. Shouldn't this operation be asynchronous (preferred), if not can we have some sort of argument set to this function to limit the results (useful but it should only supplement asynchronous)?

I am currently using this code from https://github.com/abo-abo/swiper/issues/1404 to make it interactive so that I don't accidentally hit execute it in ~ directory:

(defun my-counsel-file-jump ()
  "Forward to `counsel-file-jump' with non-nil prefix argument."
  (interactive)
  (setq current-prefix-arg '(4))
  (call-interactively #'counsel-file-jump))
basil-conto commented 6 years ago

counsel-file-jump (not counsel-jump-file) is currently synchronous, but it would be nice to make it asynchronous, indeed.

abo-abo commented 6 years ago

I agree that making counsel-file-jump async would be a good improvement.

In the meantime, I suggest counsel-fzf as a better-performing alternative. One potential downside is if you don't like fuzzy matching, fzf's matcher is similar to ivy--regex-fuzzy.

Even so, doing counsel-fzf on the home folder isn't a good idea: since there's no index the whole file tree will be checked, and that takes too long. But then we have counsel-locate which relies on a system with an index.

abo-abo commented 6 years ago

Just finished running time find ~ on my non-SSD laptop; it took 9m29s. counsel-locate is just so much better at looking for files in the home dir that it might be a good idea to use it automatically if the home directory is detected.

ghost commented 6 years ago

Thanks, @abo-abo . counsel-fzf is nice. How is it that counsel-fzf is asynchronous but the counsel-file-jump is not? counsel-file-jump uses find and counsel-fzf uses fzf -f.

Does Is it have something to do with fzf itself, like it sends fixed size buffer to read or something? Just curious to know.

Overall I think doing counsel-git in git directories and counsel-fzf in non-git directories seems to be a good solution. I'll keep counsel-locate in mind too. Usually, I don't search in ~!

andreyorst commented 5 years ago

allowing to pass and narrow candidates asynchronously (much like in plain fzf, e.g. it doesn't wait til command that produces candidates is finished) would be nice improvement.

Alexander-Shukaev commented 5 years ago

The problem is that find utility does not accept regular expressions, which one would expect with search as one types behavior of this package. It rather uses some globbing syntax for which we would need a crippled translator.

Alexander-Shukaev commented 5 years ago

That's the reason why we first consume all output of find utility and only then filter it with Emacs regular expressions.

andreyorst commented 5 years ago

The problem is that find utility does not accept regular expressions, which one would expect with search as one types behavior of this package.

I don't see how regular expressions related to fuzzy matching and asynchronous candidate narrowing

That's the reason why we first consume all output of find utility and only then filter it with Emacs regular expressions.

fzf works with find via pipe, and it doesn't need to wait for full list of candidates to narrow it or do fuzzy matching

Alexander-Shukaev commented 5 years ago

Exactly, hence, you are essentially asking for reimplementation of FZF in Emacs Lisp. The profit is questionable.

andreyorst commented 5 years ago

Exactly, hence, you are essentially asking for reimplementation of FZF in Emacs Lisp. The profit is questionable.

No, I don't. We already have fuzzy matching in Ivy. It just doesn't accept candidates over time, and waits for full list of candidates.

Alexander-Shukaev commented 5 years ago

Emacs is single-threaded, how will you achieve that?

andreyorst commented 5 years ago

Emacs has builtin eshell written in elisp, which doesn't block Emacs while running find, so I guess there are ways to start process and take its output periodically.

Alexander-Shukaev commented 5 years ago

Sure, but in this case you would search for everything and then filter in Emacs. It means you will potentially blow up memory consumption, suffer from GC, and this will be slow anyway as all of that, i.e. reading output portion in some time window and then filtering it will be sequential (no threads). Furthermore, if you change input, we have to start filtering from scratch that huge collection of candidates. This will not scale. A better approach is to do what I mentioned earlier. We need to retrigger find utility invocation as you type, on each character. FZF async integration works the same way. The problem now is the filtering syntax of find utility. The solution is to either implement translator or take a bold move and simply pipe it through grep utility first, which is easy to apply filtering for, and only plug the resulting filtered output to Emacs then. The advantages of ease of implementation and performance are obvious.

andreyorst commented 5 years ago

Sure, but in this case you would search for everything and then filter in Emacs.

And that's exactly how counsel-file-jump works. But it waits for the full list of candidates, instead of accepting them over time

Alexander-Shukaev commented 5 years ago

And your initial proposal will not improve it much, see above.

andreyorst commented 5 years ago

And your initial proposal will not improve it much, see above.

Oh it would be a great time-saver improvement. Instead of waiting for 24k candidates for 8 seconds every time when invoking this command, I may obtain first 3k in 1 second and file I'm looking for may be already listed within those, so I open it before full list is built and filtered. And I will input my fuzzy string in that first second as well. Currently I need to wait for 8 seconds plus 1 second when I actually type what I want. Even if my file will be at the very end of the candidate list in the worst perspective it will take the same time as in current implementation minus one second when I've typed.

I'd rather sacrifice machine resources in order to get my file faster, instead of waiting every time for no reason, because the file I'm looking for might be in first hundred of candidates, but I need to wait for all 24k candidates for no reason.

Alexander-Shukaev commented 5 years ago

Did you read through my comparison between what you propose and what I propose? I'm not saying that the current implementation is good or optimal. Forget about it, what I'm saying is that your proposal will not fly much further. Go ahead and implement it if you are interested. That implementation will be unnecessary convoluted and underperform compared to external tools doing all of the heavy lifting. Replacing counsel-{file,directory}-jump implementation with asynchronous find ... | grep ... will do the trick.

Alexander-Shukaev commented 5 years ago

To kill it further, I could replace grep with rg for example, and as a result, benefit from parallelized filtering, which you will never get in Emacs Lisp.

andreyorst commented 5 years ago

If you're proposing counsel-file-jump to work by starting new process on every keystroke then it is the absolute worst way of doing fuzzy matching over list of files. Especially when we already see that counsel-rg isn't really performant because of that. Forking processes on each keystroke will make things even worth on slow CPU and memory access, because time to shut application down is significant. No matter how fast you're filtering with rg if it is slow to start and finish. And I can experience such problems with some tools like running rg in skim -i -c which essentially runs new rg on every keypress, and it's pretty fast working tool, written in pretty fast language, compared to elisp.

Alexander-Shukaev commented 5 years ago

Let's put it simple, are you happy with counsel-fzf?

andreyorst commented 5 years ago

no, I'm using counsel-file-jump because it actually works faster. Basically since counsel-fzf recalculates whole list on every keypress, so in case you written fuzzy string that doesn't match you need to delete it you wait whole time again for all relevant candidates.

Alexander-Shukaev commented 5 years ago

On slow CPU and memory access your approach will be even slower. The point is to isolate as much resource allocation outside Emacs as possible because it's not doing well with respect to combining user interactivity together with heavy computational/memory load. Termination of child can be done asynchronously, this is not really an argument here. I don't know what kind of system/hardware you're are running, but I'd say arguments about forking a process on Unix on every keystroke from human interactivity perspective sound ridiculous in present decade.

Basically since counsel-fzf recalculates whole list on every keypress, so in case you written fuzzy string that doesn't match you need to delete it you wait whole time again for all relevant candidates.

How does your Emacs Lisp candidate filtering alleviate this issue? When you change the input pattern, somebody has to refilter the whole original input collection and rebuild the filtered collection from scratch to display it. The difference will be that with your proposal, Emacs will be choking doing all that heavy filtering in one thread and on top of the large original input collection (e.g. when you invoke find from /) on each keystroke (Yes, that huge input collection had to be read only once by Emacs, so what? That's not the only bottleneck here and since that input collection is still of the same huge size as in the current implementation, it does not improve much), while in my approach, Emacs will only consume relatively small portion of already filtered (potentially in parallel) output asynchronously in order to only display it to the user.

andreyorst commented 5 years ago

The point is to isolate as much resource allocation outside Emacs as possible because it's not doing well with respect to combining user interactivity together with heavy computational/memory load.

Then we need a proper tool and to call it in a place where we do not have to worry about managing allocations and do heavy computations and managing the display of the results. Oh wait, it means we just need to run fzf from the terminal. Ivy is extremely performant at showing and filtering really big amount of files once the list of candidates is complete. I'm talking about filtering the worst case (the last file in the list, with the match of 84 letters fuzzy string pasted from clipboard) of 90k files. It happens immediately. The only problem here that making the list of 90k files takes more time then filtering time, and that it seems that ivy doesn't do incremental matching.

How does your Emacs Lisp candidate filtering alleviate this issue? When you change the input pattern, somebody has to refilter the whole original input collection and rebuild the filtered collection from scratch to display it.

Incremental filtering gets faster with each keystroke since less variants are filtered. Caching previous results helps when deleting. In contrast when we invoking new process on each keystroke we build everything from scratch, because the tools we're using are not implemented with incremental usage in mind. Of course when we insert absolutely new pattern we have to do all computing from the ground up in both variants, but incremental filtering is much faster still.

Alexander-Shukaev commented 5 years ago

Since my papers are not yet signed by my employer, I will simply leave the code here. To prove my point, the following is a blazing fast replacement of counsel-*-jump functions:

(require 'counsel)

(defun counsel--projectile-project-root ()
  "Return root of current project or nil on failure.
Use `projectile-project-root' to determine the root."
  (and (fboundp 'projectile-project-p)
       (fboundp 'projectile-project-root)
       (projectile-project-p)
       (projectile-project-root)))

(defvar counsel-filter-file-root-functions
  '(counsel--projectile-project-root
    counsel--project-current
    counsel--git-root
    counsel--configure-root
    counsel--dir-locals-root)
  "Special hook to find the project root for `counsel-filter-file'.
Each function on this hook is called in turn with no arguments
and should return either a directory, or nil if no root was
found.")

(defun counsel-filter-file-root ()
  "Return root of current project or `default-directory'.
The root is determined by `counsel-filter-file-root-functions'."
  (or (run-hook-with-args-until-success 'counsel-filter-file-root-functions)
      default-directory))

(defcustom counsel-filter-file-dir-function #'counsel-filter-file-root
  "Function that returns a directory for `counsel-filter-file'."
  :type 'function)

(defcustom counsel-filter-file-find-args "-name '.git' -prune -o -type f -print"
  "Arguments for the `find-command' for `counsel-filter-file'."
  :type 'string)

(defcustom counsel-filter-file-grep-args "-E -e %s"
  "Arguments for the `grep-command' for `counsel-filter-file'."
  :type 'string)

(defvar counsel-filter-file-command nil
  "Command for `counsel-filter-file'.")

(defvar counsel-filter-file-dir nil
  "Directory for `counsel-filter-file'.")

(defun counsel-filter-file-function (string)
  (or (ivy-more-chars)
      (let ((default-directory counsel-filter-file-dir)
            (regex (replace-regexp-in-string "\n" "" (counsel--grep-regex string))))
        (counsel--async-command (format counsel-filter-file-command
                                        (shell-quote-argument regex)))
        nil)))

(defun counsel-filter-file-action (f)
  (with-ivy-window
    (let ((default-directory counsel-filter-file-dir))
      (find-file f))))

;;;###autoload
(defun counsel-filter-file (&optional initial-input initial-directory prompt action)
  (interactive (list nil
                     (when current-prefix-arg
                       (read-directory-name "From directory: "))))
  (counsel-require-program find-program)
  (counsel-require-program grep-program)
  (setq counsel-filter-file-command
    (mapconcat #'identity
               (list find-program
                     " . "
                     counsel-filter-file-find-args
                     (when (fboundp 'make-process)
                       (concat " 2> " (shell-quote-argument null-device)))
                     " | cut -c 3- | "
                     grep-program
                     counsel-filter-file-grep-args
                     (unless (string-match-p "%s"
                                             counsel-filter-file-grep-args)
                       " %s"))
               " "))
  (setq counsel-filter-file-dir
    (or initial-directory
        (if (functionp counsel-filter-file-dir-function)
            (funcall counsel-filter-file-dir-function)
          default-directory)))
  (ivy-read (or prompt "Filter file: ")
            #'counsel-filter-file-function
            :initial-input initial-input
            :dynamic-collection t
            :preselect (counsel--preselect-file)
            :history 'file-name-history
            :keymap counsel-find-file-map
            :action (or action #'counsel-filter-file-action)
            :unwind #'counsel-delete-process
            :caller 'counsel-filter-file))

(defcustom counsel-filter-directory-find-args "-name '.git' -prune -o -type d -print"
  "Arguments for the `find-command' for `counsel-filter-directory'."
  :type 'string)

(defcustom counsel-filter-directory-grep-args "-E -e %s"
  "Arguments for the `grep-command' for `counsel-filter-directory'."
  :type 'string)

(defun counsel-filter-directory-action (d)
  (with-ivy-window
    (let ((default-directory counsel-filter-file-dir))
      (dired-jump nil (expand-file-name d)))))

;;;###autoload
(defun counsel-filter-directory (&optional initial-input initial-directory prompt action)
  (interactive (list nil
                     (when current-prefix-arg
                       (read-directory-name "From directory: "))))
  (let ((counsel-filter-file-find-args counsel-filter-directory-find-args)
        (counsel-filter-file-grep-args counsel-filter-directory-grep-args))
    (counsel-filter-file initial-input
                         initial-directory
                         (or prompt "Filter directory: ")
                         (or action #'counsel-filter-directory-action))))

Please, feel free to test and integrate if you like it.

andreyorst commented 5 years ago

To prove my point, the following is a blazing fast replacement of counsel-*-jump functions

This code works but it doesn't provide any sort of fuzzy matching, therefore it is completely obliterated by any counsel-*-jump function, and since there's already counsel-fzf that does the same thing, while allowing fuzziness I don't see how this code is any better.

Alexander-Shukaev commented 5 years ago

This code works but it doesn't provide any sort of fuzzy matching, therefore it is completely obliterated by any counsel-*-jump function, and since there's already counsel-fzf that does the same thing, while allowing fuzziness I don't see how this code is any better.

I've just changed the code to support any regular expression builder, including the fuzzy one, ivy--regex-fuzzy. Nothing is obliterated by anything. counsel-*-jump is unacceptable solution in the long run, the above works much better in every respect however. It also has an additional advantage of portability compared to fzf. Not every box would have fzf installed. A very common example are restricted corporate environments with remote boxes where one works via tramp, to which me and many others are exposed a lot during regular working hours.

On a side note, you criticize a lot, but don't implement the solution you are advocating for. Why don't you go ahead and implement it so that we can profile and compare which one performs better on huge collections and which implementation is simpler?

Alexander-Shukaev commented 5 years ago

@abo-abo, if you decide to integrate, then my last comment from #1812 again applies. In brief, if we let external tools to do fuzzy matching like it's done for the above counsel-filter-* functions and is also already done for counsel-*g functions, then for better relevance of fuzzy-filtered output returned by these external tools, we should sort it in some clever way using whatever scoring algorithm, which we should apply as the last post-processing stage in Emacs itself. Not sure if flx is well-suited for this or we need to roll our own. Does ivy offer any facility to perform such sorting on an arbitrary candidate collection?

andreyorst commented 5 years ago

On a side note, you criticize a lot, but don't implement the solution you are advocating for. Why don't you go ahead and implement it so that we can profile and compare which one performs better on huge collections and which implementation is simpler?

I guess you don't have to be chef to criticize food is valid point.

Originally I just came here, since there was a discussion of a feature that I, as user, is interested in. I'm not elisp programmer, I don't know this language well enough to implement such a thing, which doesn't restrict me in any way from criticizing stuff.

I fully understand that If you criticize someone's solution, offer your solution is absolutely one hundred present valid point too, but as I said, I don't know lisp well enough, and my initial interest was in the ability to accept candidates asynchronously, since it also can be used by other sources like projectile, which was discussed there too: https://github.com/bbatsov/projectile/issues/1307, and since this particular issue is related to that problem, I've decided to add my vote for the feature in the comments.

I've just changed the code to support any regular expression builder, including the fuzzy one, ivy--regex-fuzzy

I'm going to try it out then

Nothing is obliterated by anything. counsel-*-jump is unacceptable solution in the long run, the above works much better in every respect however.

There's one point I'm not sure enough about. I'm using a lot of servers simultaniously and I work with the remote files via different programs quite frequently, which may do not feature remote file access. I'm using sshfs for that, since it is universal way for all programs to access remote files, and it's pretty fast. For example, if my colleague wants to do something on my PC with me I could fire up his favorite editor for him and provide access to the files natively, as on his particular PC. So sshfs is a must for my workflow, and I can't abandon it. Wouldn't starting the find process over sshfs mounted volume be less performant, compared to the solution like counsel-*-jump where we retain list of files once? I'm not sure since I do not know well enough the mechanism of sshfs mounting and how often it re-fetches files, and wouldn't every call for find initiate the fetch and slow everything down?

A very common example are restricted corporate environments with remote boxes where one works via That's by the way is one of the reasons why I use sshfs - I can have any program on my PC and work with the server files without installing something on the server, which also has horribly outdated pacakge base.

Alexander-Shukaev commented 5 years ago

With respect to sshfs, that's an interesting question. I guess it will all depend on mounting options/parameters related to caching. It's no different from NAS in this respect. When using transparently mounted file systems to do any type of work, one has to take into account what kind of access patterns are expected and configure mounting options accordingly. Of course there is nothing we can do about it from Emacs, it's a system configuration problem. All one can do is check the manual and profile. I'd be interested in your results.

mookid commented 5 years ago

I'd say arguments about forking a process on Unix on every keystroke from human interactivity perspective sound ridiculous in present decade.

I did not follow the whole thing, but Windows users don't have the same story here.

Alexander-Shukaev commented 5 years ago

Yes, they don't. However, this needs to be profiled and proven that your typing experience for input pattern is impaired on Windows by process spawning. Otherwise, all these discussions are speculations.

andreyorst commented 5 years ago

When using transparently mounted file systems to do any type of work, one has to take into account what kind of access patterns are expected and configure mounting options accordingly.

Some notes about it. I'm using fzf every day (not counsel-fzf, but my own script for another editor, called Kakoune) to access files from sshfs file-system with no issues, but it runs find only once and filters the whole thing. However, when using grep or even ripgrep via skim over sshfs file-system I experience serious lag, compared to less noticeable lag (and practically no lag with rg) when using on local storage.

Alexander-Shukaev commented 5 years ago

Could you test performance of FZF_DEFAULT_COMMAND=rg --files --hidden --no-ignore --no-messages --smart-case --glob '!.git/*'?

andreyorst commented 5 years ago

Could you test performance of FZF_DEFAULT_COMMAND=rg --files --hidden --no-ignore --no-messages --smart-case --glob '!.git/*'?

I'm using fd instead of find which respects .gitignore. I'll be able to test it with rg tomorrow.

Also there's another problem with your filter function. When I filter file with counsel-file-jump I need to type only coorg in order to get config.org first in the list. When I'm using counsel-fzf or plain fzf I need to type coorg in order to get config.org first in the list. But since grep does arbitrary matching and doesn't do sorting, coorg matches config.org but it is far from beginning, which isn't quite right. And even I input full match config.org it's still not first candidate:

counsel-filter-file with full match (with fuzzy it's even lower in the list): image

counsel-file-jump with fuzzy match: image

counsel-file-jump with full match: image

counsel-fzf with fuzzy match (I'm not using it with full matches since there's no reason to): image

obviously because there's hell lot of .config/ matches, but fzf handles that perfectly by understanding that I'm searching for file that might be in a directory, and not searching a directory. And there are even more smart matchers out there.

Alexander-Shukaev commented 5 years ago

This is exactly the issue that I referred to in the comment above (see the end of #1812 also). rg is faster than fd by the way for file traversal (because of parallel iterator). Let's see your results tomorrow.

andreyorst commented 5 years ago

Could you test performance of FZF_DEFAULT_COMMAND=rg --files --hidden --no-ignore --no-messages --smart-case --glob '!.git/*'?

I've did some arbitrary measures like so:

$ fd . --no-ignore --type f --hidden --exclude .git --exclude .svn | wc -l
75382
$ rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob '!.svn/*' | wc -l 
100738
$ for i in 1 2 3 4 5; do time fd . --no-ignore --type f --hidden --exclude .git --exclude .svn >/dev/null; done
fd . --no-ignore --type f --hidden --exclude .git --exclude .svn > /dev/null  0.71s user 1.77s system 17% cpu 14.422 total
fd . --no-ignore --type f --hidden --exclude .git --exclude .svn > /dev/null  0.40s user 0.91s system 110% cpu 1.187 total
fd . --no-ignore --type f --hidden --exclude .git --exclude .svn > /dev/null  0.50s user 0.87s system 116% cpu 1.178 total
fd . --no-ignore --type f --hidden --exclude .git --exclude .svn > /dev/null  0.42s user 0.93s system 112% cpu 1.195 total
fd . --no-ignore --type f --hidden --exclude .git --exclude .svn > /dev/null  0.46s user 0.88s system 117% cpu 1.152 total
$ for i in 1 2 3 4 5; do time rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob '!.svn/*' >/dev/null; done 
rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob  >   1.61s user 3.38s system 21% cpu 23.116 total
rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob  >   1.46s user 3.05s system 21% cpu 20.892 total
rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob  >   0.92s user 1.77s system 95% cpu 2.825 total
rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob  >   0.91s user 1.85s system 81% cpu 3.387 total
rg --files --hidden --no-ignore --no-messages --glob '!.git/*' --glob  >   0.72s user 2.10s system 73% cpu 3.827 total

This is scanning of whole remote /home/user/ directory. fd is generally faster than rg. As for counsel-fzf performance, I feel that it is better with fd too. Don't know how to measure performance of Emacs command.

When scanning remote project this way, results are similar, exept that rg was little bit faster, but I guess I need to drop caches somehow before first run to get proper results:

$ fd --hidden --follow --no-ignore --exclude .svn --exclude .git | wc -l 
10952
$ rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glob '!.svn/*' | wc -l 
9240
$ for i in 1 2 3 4 5; do time fd . --no-ignore --follow --type f --hidden --exclude .git --exclude .svn >/dev/null; done
fd . --no-ignore --follow --type f --hidden --exclude .git --exclude .svn >   0.07s user 0.25s system 35% cpu 0.900 total
fd . --no-ignore --follow --type f --hidden --exclude .git --exclude .svn >   0.03s user 0.15s system 86% cpu 0.199 total
fd . --no-ignore --follow --type f --hidden --exclude .git --exclude .svn >   0.07s user 0.11s system 87% cpu 0.202 total
fd . --no-ignore --follow --type f --hidden --exclude .git --exclude .svn >   0.05s user 0.14s system 88% cpu 0.207 total
fd . --no-ignore --follow --type f --hidden --exclude .git --exclude .svn >   0.07s user 0.11s system 89% cpu 0.207 total
$ for i in 1 2 3 4 5; do time rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glob '!.svn/*' >/dev/null; done
rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glo  0.11s user 0.12s system 88% cpu 0.259 total
rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glo  0.07s user 0.11s system 87% cpu 0.201 total
rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glo  0.06s user 0.11s system 86% cpu 0.204 total
rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glo  0.07s user 0.11s system 87% cpu 0.206 total
rg --files --hidden --follow --no-ignore --no-messages --glob '!.git/*' --glo  0.08s user 0.11s system 90% cpu 0.209 total

This time fd found more files, which I like more, since probably in project are some files that I can miss with rg.

Also, I believe that rg doesn't need --smart-case since we're filtering with fzf which has it's own params for case