dajva / rg.el

Emacs search tool based on ripgrep
https://rgel.readthedocs.io
GNU General Public License v3.0
474 stars 39 forks source link

[Need help] use rg.el for searching multiple words with file as unit instead of lines #49

Closed stardiviner closed 5 years ago

stardiviner commented 5 years ago

Here is the original post I write on ripgrep: https://github.com/BurntSushi/ripgrep/issues/1110

(defun rg-direct (word)
  (format "rg %s" word))

(defun rg-files-with-matches-beginning (word dir)
  (format "rg --files-with-matches \"%s\" %s" word dir))

(defun rg-files-with-matches-middle (word)
  (format " | xargs rg --files-with-matches \"%s\"" word))

(defun rg-files-with-matches-end (word)
  (format " | xargs rg --heading \"%s\"" word))

(defun rg-files-construct-command (words dir)
  (case (length words)
    (1 (rg-direct (car words)))
    (2 (concat (rg-files-with-matches-beginning) (rg-files-with-matches-end)))
    (3 (concat (rg-files-with-matches-beginning (car words) dir)
               (rg-files-with-matches-middle (cadr words))
               (rg-files-with-matches-end (car (reverse words)))))
    (t (concat (rg-files-with-matches-beginning (car words) dir)
               ;; OPTIMIZE: Do I have to use (car (mapcan ...))?
               (car (mapcan
                     (lambda (word) (list (rg-files-with-matches-middle word)))
                     (delq (car (reverse words)) (cdr words))))
               (rg-files-with-matches-end (car (reverse words)))))))

;; (rg-files-construct-command
;;  (split-string "hello world stardiviner myself" " ")
;;  default-directory)

(defun rg-search-words-by-files (directory)
  "Search multiple words by file as unit instead of line.

rg --files-with-matches \"foo\" . | xargs rg --files-with-matches \"bar\" | ... | xargs rg --heading \"baz\"
"
  (interactive (list (completing-read "Dir: " `(,(expand-file-name org-directory)
                                                ,default-directory))))
  (let* ((words (split-string (read-from-minibuffer "Words: ") " "))
         (command (rg-files-construct-command words directory)))
    (compilation-start command 'rg-mode)))

I have some ideas about this code,

dajva commented 5 years ago

Thanks for your interest in this package. There are some easy parts, some more complicated parts and some problematic parts of this.

  1. Providing the --files-with-matches flag should be easy with the rg-define-search macro. Just use the :flags key.
  2. Formatting the output in the rg-mode buffer is a bit more involved since I don't think the underlying compipation-mode will recognize the format without changes. We also have rg-group-result setting that would definitely need some patching to work.
  3. I understand you want to match your search terms through the whole file and not just a single line so a regexp will not do, correct? There isn't a good mechanism for this in rg.el and does not fit very well into the rest of the package. The only reasonable integration point is to use the full command search feature (triggered with M-u M-x rg <RET>). That's a second class citizen of this package, kept only for legacy reasons so I am hesitent to extend that functionality. I think it would be ok to in some way extend the rg-define-search macro with a :command key or similar that would provide for the user to define a custom way of providing/modifying the full command line. With this in place you would have to build your feature around that integation point. It will still be a second class citizen since certain features in the results buffer simply do not work with full command search.

2 is something I would definitely accept into this package since it could also provide for an alternative to #21. If you come up with a suggestion for 3 we can discuss it and most probably land it in this package in some form. It depends on how it ends up and the maintainance cost for it though.

stardiviner commented 5 years ago

I understand you want to match your search terms through the whole file and not just a single line so a regexp will not do, correct?

Yes that's what I want exactly.

I would choose the "3" solution. Welcome to discuss it. I don't know how to implement my case, can you show an example?

stardiviner commented 5 years ago

@dajva Hi, any update on this?

stardiviner commented 5 years ago

I improved my code, but got two problem. Can you help me? Thanks Here is my code:

;;; search multiple words in files.
(defun rg-files-with-matches-beginning (dir file-type word)
  "Construct literal rg comdnam end part for the beginning WORD in DIR with FILE-TYPE."
  (format "rg --files-with-matches --type %s -e \"%s\" %s" file-type word dir))

(defun rg-files-with-matches-middle (word)
  "Construct literal rg comdnam end part for the middle WORD."
  (format " | xargs rg --files-with-matches -e \"%s\"" word))

(defun rg-files-with-matches-end (word)
  "Construct literal rg comdnam end part for the last WORD."
  (format " | xargs rg --heading -e \"%s\"" word))

(defun rg-files-construct-command (dir words)
  "Construct a literal rg command to search WORDS in DIR."
  (let* ((file-type (completing-read "file type: "
                                     (mapcar 'car (rg-get-type-aliases))
                                     nil nil "org"))
         (file-type-exts (assq file-type (rg-get-type-aliases))))
    (case (length words)
      (1 (format "rg %s" (car words)))
      (2 (concat (rg-files-with-matches-beginning dir file-type (car words))
                 (rg-files-with-matches-end (cdr words))))
      (3 (concat (rg-files-with-matches-beginning dir file-type (car words))
                 (rg-files-with-matches-middle (cadr words))
                 (rg-files-with-matches-end (car (reverse words)))))
      (t (concat (rg-files-with-matches-beginning dir file-type (car words))
                 ;; KLUDGE: Do I have to use (car (mapcan ...))?
                 (car (mapcan
                       (lambda (word) (list (rg-files-with-matches-middle word)))
                       (delq (car (reverse words)) (cdr words))))
                 (rg-files-with-matches-end (car (reverse words))))))))

(defun rg-search-words-by-files (directory)
  "Search multiple words in files of DIRECTORY as unit instead of line.

The literal rg command looks like this:

rg --files-with-matches \"foo\" . | \\
xargs rg --files-with-matches \"bar\" | \\
... | \\
xargs rg --heading \"baz\"

That's it.
"
  ;; interactively select Org default directory or current directory.
  (interactive (list (completing-read "Dir: " `(,(expand-file-name org-directory)
                                                ,default-directory))))
  ;; read in multiple words as a sequence of words.
  (let* ((words (split-string (read-from-minibuffer "Words: ") " "))
         (command (rg-files-construct-command directory words)))
    ;; FIXME rg report 0 match, but the result has matches.
    ;; dive into rg.el source code to figure out.
    ;; use `rg-define-search'
    (compilation-start command 'rg-mode)))

(define-key rg-prefix (kbd "M-o") 'rg-search-words-by-files)
(define-key Org-prefix (kbd "s") 'rg-search-words-by-files)

Problem 1: When I search three words "Emacs Linux Arch", I got error on some filename with space. Here is part of result:

-*- mode: rg; default-directory: "~/Org/" -*-
rg started at Mon Dec 24 10:06:26

rg --files-with-matches --type org -e "Emacs" /home/stardiviner/Org | xargs rg --files-with-matches -e "Linux" | xargs rg --heading -e "Arch"
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
/home/stardiviner/Org/Accounts/local: No such file or directory (os error 2)
accounts.org: No such file or directory (os error 2)
/home/stardiviner/Org/Learning: No such file or directory (os error 2)
Plan/Learn: No such file or directory (os error 2)
Programming: No such file or directory (os error 2)
Plan.org: No such file or directory (os error 2)
/home/stardiviner/Org/Learning: No such file or directory (os error 2)
Plan/Learning: No such file or directory (os error 2)
Plan.org: No such file or directory (os error 2)
/home/stardiviner/Org/Tasks/Computer: No such file or directory (os error 2)
Todos.org: No such file or directory (os error 2)
/home/stardiviner/Org/Wiki/Computer: No such file or directory (os error 2)
Technology/Computer: No such file or directory (os error 2)
Technology.org: No such file or directory (os error 2)
/home/stardiviner/Org/Programming: No such file or directory (os error 2)
655:- [[file:~/Org/Wiki/Computer%20Technology/Programming/Emacs/Data/Emacs%20Packages/Org%20mode/Org%20mode.org::*Capture%20-%20Refile%20-%20Archive][Capture - Refile - Archive]]

/home/stardiviner/Org/dotfiles/dotfiles.org
7:** TODO handling Arch AUR packages
14:* Install Arch Linux
91:*root* (=/=), the small usually 1G is =/boot=. If you reinstall or fix Arch Linux, you only need
116:** Pacstrap: install Arch Linux base -- =pacstrap=
227:If you are not re-install Arch Linux, you just want to fix some kernel upgrade failure.
261::   Operating System: Arch Linux
263::       Architecture: x86-64
601:mkdir -p ~/Code/Linux/ArchLinux/
1063:** After Arch Linux Installation
1072:* Arch Linux
1074:** Arch packages
1109::DESCRIPTION: Arch Linux build source file management tool.
1122:** Tools for creating Arch Linux Linux distribution
1128:** Arch Linux Configurations
1130:*** Arch Linux Theme

Problem 2:

The result does not show how much matched.

rg finished (0 matches found) at Mon Dec 24 10:06:27

It gives out (0 matches found).

dajva commented 5 years ago

Sorry for the late reply. It's been the hollidays this time of the year.

Nothing happening regarding this form my side. Looks like you got it working pretty well here. Question is if you need to integrate this into rg.el at all? Might be sufficient with just compilation-mode?

Anyway, what you have done indicates that no special handling in rg-mode is needed for the compilation-mode matching regexps to work so my item 2 above shouldn't be needed.

  1. Don't know why you get problems with spaces. ripgrep should handle that fine I think. I would guess it's some missing escaping going on in your pipeline or similar.
  2. Match counting is triggerd by the color code escape sequences from rg output. You need to use the same colors as rg.el in your last rg invokation for this to work.

As said above, I think this might better be integrated into compilation-mode directly. If you still want to use this package for your feature you can submit a patch for my item 3 above. That would mean modifying rg-search-parse-local-bindings function to handle custom forms for the :confirm key in addition to the never, always and prefix symbols. The custom forms would allow you to create your desired command line I think. Not sure if that would be much better than what you are currently doing though.