Genivia / ugrep-indexer

A monotonic indexer to speed up grepping by >10x (ugrep-indexer is now part of ugrep 6.0)
https://ugrep.com
BSD 3-Clause "New" or "Revised" License
67 stars 1 forks source link

Work in progress and new features #5

Closed genivia-inc closed 10 months ago

genivia-inc commented 10 months ago

A summary of work in progress and planned new features in the upcoming update (available soon):

Archives and tarballs are indexed with ugrep-indexer option -z by the files they contain. Each archived file has an index hash stored in the index file that is then used by ug --index to check if the archive/tarball possible matches the specified pattern. If any one of the archived files in an archive possibly matches, then ug --index will search the archive/tarball.

Also ugrep-indexer --zmax=NUM is supported, to index nested archives up to NUM levels deep. Default is --zmax=1, like ugrep.

The ug --index=MODE mode specification may be useful to control index-based search. For example, --index=fast could skip the file modification data/time check, which is useful to search a bit faster when the indexes are all kept up to date (default is safe to always search files that are modified after indexing). Not sure yet if this is really any faster and useful. Testing will tell.

genivia-inc commented 10 months ago

I've ugrep-indexer.exe committed to the repo with new options -z and --zmax=NUM to index archives and compressed files. Will perform some more testing tomorrow and if all goes well will then release the ugrep-indexer 0.9.2 update as a release.

The ugrep 4.3.5 update will also released and supports option --index with -z to search indexed archives and compressed files. The ugrep.exe 4.3.5 also supports indexed-based searching. Because searching zip archives and compressed files is typically slow (relatively speaking), indexing these files gives a decent speed boost.