Open qpwo opened 3 years ago
I've played with that in my head for quite a while, but never actually tried it.
The indexing tools at https://beyondgrep.com/more-tools/ may be ideas about what you could use. Also, if you're looking for functions and variables a lot, using ctags
may do 90% of what you're looking for.
Aside from the cache question: How many hundreds of MBs do you have, and how long are searches taking? One thing we've had trouble with over the years is that some folks have systems where ack takes far longer than we would expect it to, and we haven't been able to figure out why. I'm wondering if you might be in that situation as well. See #194 for example.
Tuning the OS filecache reservation and/or switching from spinning iron-oxide to SSD can greatly improve read speed.
I have doubts about one tool having both cached-index mode and grep mode, and i wouldn't want to give up extemporaneous usage of ack. Alas the existing tools that build inverted indexes in Perl do not appear to under maintenance (plucene, and relatives). (There's a newer one that uses BDB or PG, uh, no.)
I've installed swish-e
for both cached-index search and natural-language spanning-lines use cases.
While the name expands to "Simple Web Indexing System for Humans - Enhanced" and it rather expects you'll expose it locally with CGI and Nginx or Apache, it has a commandline interface and API too. (It's in Ubuntu package manager and probably available for any platform of interest. Has Perl API that seems newer than any P?Luc(y|ene).) While it's designed for natural language, it's recommended self-demo is searching it's source code.
I've even experimented with scanning files matched by swish-e with ack:
swish-e -x '%p\n' -w 'constance near5 snow near40 doane' | egrep -v '^#' | ack -x -C30 -iw 'snow|doane|constan[tc]e?|daniel' --pager='less -iR'
Before i can make full use of it i need to figure out how to capture metadata about a document along with its contents, and what's my needed metadata schema ... ugh. I should remember from 20+ years ago in late Web1.0 when i was buying a bleeding edge indexing engine that Information Retrieval is NOT as easy as it looks!.
I hope this isn't too naive but I couldn't find anything on it. I have a directory with hundreds of MBs of source code and every search takes a long time. Would it be possible to make a saved cache index for a directory and update it for updated files when you make a new search?