Debian / debiman

debiman generates a static manpage HTML repository out of a Debian archive
Apache License 2.0
185 stars 46 forks source link

full text search? #96

Open anarcat opened 6 years ago

anarcat commented 6 years ago

Did you ever consider implementing full text search for debiman? The idea here would be to replicate the apropos(1) and whatis(1) commands to replicate what the man command already does, but also allow for complete full-text search over the entire database, something no manpage software does at the moment, as far as I know.

I did a summary evaluation of available options for this for debmans. I did some tests with Xapian, because that is what is used elsewhere on Debian.org, but I also had good experiences with ElasticSearch (ES), which is somewhat simpler to deploy and use.

I just recently stumbled upon the bleve project which seems like an interesting tool to generate search indexes. Obviously, adding search would make the first indexing much slower, but it would probably be okay on incremental indexes...

One advantage of using a pre-built implementation like Xapian or ES is that we don't have to implement anything: because we render HTML pages, those tools can just read that content themselves and there is very little code (if any) to write to hook them up. It's mostly a sysadmin job. The downside is it is more difficult to implement "faceted" searches (e.g. "look for 'gnutls_connect' only in jessie and japanese") with pre-built items, whereas Bleve supports structured data really well.

stapelberg commented 6 years ago

I just use “site:manpages.debian.org” in my web search engine of choice.

Note that the example cases you listed already work e.g. in Google: “site:manpages.debian.org gnutls_connect inurl:jessie” (Japanese is implied if Google knows you prefer Japanese)

Given that, I’m not sure we need to spend any effort (and DSA time) on our own, almost guaranteed to be inferior, full text search.

anarcat commented 6 years ago

"almost guaranteed to be inferior" is not a very optimistic prognostic, for sure. :p

in that case, how about adding a "search" box that points to Google? it may sound odd, but not everyone knows about the site: feature of google, or would think to use inurl:...

stapelberg commented 6 years ago

I’m fine with that personally. Would you like to send a pull request? :)

anarcat commented 6 years ago

hmm... honestly I'm not a big fan of Google so I'd rather avoid sending traffic their way. :p

ischwarze commented 5 years ago

Hi @anarcat ,

[...] complete full-text search over the entire database, something no manpage software does at the moment, as far as I know

That is not entirely accurate: i know of one operating system where man(1) -k doas by default support full-text search: NetBSD. That was implemented by Abhinav Upadhyay several years ago. Unfortunately, that has manoeuvered their apropos(1) implementation into a dead-end road which they have so far been unable to get out of again, so NetBSD apropos(1) has substantially fallen behind in other, more important respects.

That said, full text search is of very little relevance for manual pages. At least for manual pages written in the good mdoc(7) language, almost everything that people might want to search for is marked up with semantic macros, so semantic search is much more relevant for manual pages than full-text search. For a practical example of how that works on the web, see https://man.openbsd.org/ - the apropos(1) manual page link on the start page explains the details.