astoff / devdocs.el

Emacs viewer for DevDocs
289 stars 17 forks source link

Add devdocs-grep command #15

Open astoff opened 2 years ago

astoff commented 2 years ago

@minad If you want to discuss more about the grep command, we can do it here.

The current version works, but is synchronous completely. It should be possible to add some degree of asynchronicity using timers. There's also the async package, but I'm not sure I want to depend on it.

As to the possibility of pre-rendering the HTML files at installation time, is it even possible to serialize a buffer with all its text properties? I would also have to worry about invalidating the pre-rendered pages when some shr customization changes, so this look quite complicated in the end.

Finally, leaving speed considerations aside for a moment: is the possibility of a Consult integration precluded by doing things in Lisp? Can you specify an async source from a normal buffer instead of a process buffer?

minad commented 2 years ago

The current version works, but is synchronous completely. It should be possible to add some degree of asynchronicity using timers. There's also the async package, but I'm not sure I want to depend on it.

My proposal would rather be to provide different pluggable frontends here, such that the user can plug in consult-grep in their init.el. The default implementation could be based on the default grep. Would something like this work? I would rather not reimplement your own asynchronous grep here, then you would just unnecessarily duplicate the work done in consult-grep etc. I had imagined a very simple integration:

  1. Prerender the text, keep it in a directory separate from the html
  2. Introduce a devdocs-grep command which calls a devdocs-grep-function within the devdocs directory. The function should return the selected file.

As to the possibility of pre-rendering the HTML files at installation time, is it even possible to serialize a buffer with all its text properties? I would also have to worry about invalidating the pre-rendered pages when some shr customization changes, so this look quite complicated in the end.

Then we cannot use grep anymore?

Finally, leaving speed considerations aside for a moment: is the possibility of a Consult integration precluded by doing things in Lisp? Can you specify an async source from a normal buffer instead of a process buffer?

Yes, we cannot scan buffers asynchronously.

minad commented 2 years ago

Ah okay, now I looked at your code. This is different than what I had in mind. If it works well, why not? But maybe my idea is worth exploring too? With external grep, consult-grep we could enjoy the asynchronicity and with consult the live updating search.

astoff commented 2 years ago

If it works well, why not?

It works correctly, but is super slow. By the time you start grepping a well-indexed document, you are pretty desperate, so it might be acceptable, though. In any case, I'm not going to merge this right away.

minad commented 2 years ago

It works correctly, but is super slow. By the time you start grepping a well-indexed document, you are pretty desperate, so it might be acceptable, though. In any case, I'm not going to merge this right away.

Okay, my opinion is that this should not be added. I don't see a point in having slow commands around in particular if we have better tools like ripgrep. This leaves us with either the isearch solution or the solution I proposed above with the pregenerated text files.

astoff commented 2 years ago

I've added a new commit making the search asynchronous (but still single threaded). You might be curious to check out how it works :-).

minad commented 2 years ago

So the results will come in live. That's nice. But I would still go with another solution. I am not sure if devdocs should invent its own search command, given that we have good alternatives. I am all for decoupling packages and having clear responsibilities. But maybe you can make this search command so much better by specializing it to devdocs, such that it will be worth it.

astoff commented 2 years ago

I don't see a point in having slow commands around in particular if we have better tools like ripgrep.

Can you ripgrep HTML files without seeing garbage in the output? I'm not even sure all the documents have meaningful line breaks! One can use a pipe pandoc --to plain | grep, but then the reported line numbers are not meaningful, and I'm not sure how to deal with that (also, even on Fedora there doesn't seem to be any html-to-text command installed by default).

(As a desperate measure, I could grep the raw HTML to decide if a given document page contains matches, and only then do the full-blown shr thing.)

minad commented 2 years ago

Can you ripgrep HTML files without seeing garbage in the output?

I believe there is a ripgrep wrapper which can do that, also search pdfs etc. But I already proposed a better solution - pregenerate the text files with shr. What is wrong with that? You can either pregenerate when the docs are fetched or when the search is started for the first time. But I would probably do it after the fetch, since you probably already do some post processing of the fetched data?

astoff commented 2 years ago

What is wrong with that?

Mainly two things: one, the line numbers will get out of sync if you change your fonts or any other shr config; and installing docs will take a lot longer (perhaps 5 minutes for OpenJDK!)

Anyway, I will sleep on this for some time :-).