Genivia / ugrep

NEW ugrep 6.5: a more powerful, ultra fast, user-friendly, compatible grep. Includes a TUI, Google-like Boolean search with AND/OR/NOT, fuzzy search, hexdumps, searches (nested) archives (zip, 7z, tar, pax, cpio), compressed files (gz, Z, bz2, lzma, xz, lz4, zstd, brotli), pdfs, docs, and more
https://ugrep.com
BSD 3-Clause "New" or "Revised" License
2.56k stars 109 forks source link

[FR] 90% speed up by refactoring and optimizing some code #385

Closed genivia-inc closed 3 months ago

genivia-inc commented 4 months ago

ugrep can run faster by refactoring the search logic to break up the large code block in advance() into separate functions that get called quicker e.g. by a switch or function pointer to skip conditionals. Breaking up this large function helps the compiler a lot to optimize this code better than having to analyze a large function body.

A bit of experimentation shows significant speed improvements are attainable on ARM64 NEON at least. So it is worth the effort to refactor this code that is not fully optimized by the compiler.

Even adding a dummy printf() statement runs the code faster (!) despite the overhead of IO. So yeah, compiler optimizations aren't kicking in a much as I want to at the moment. On a more serious note, this is not new to me. I taught several years of graduate level high-performance computing. I will more closely follow (my own) advice with the next release cycles. It's just work, not difficult to do.

With these optimizations and omitting line counting when possible, such as for option -c, when searching a 13GB file we can go from

$ time ugrep -c rol en.txt
1171415
        4.54 real         2.86 user         1.40 sys

to a much lower timing

$ time ugrep -c rol en.txt
1171415
        2.40 real         0.83 user         1.39 sys

which runs 90% faster on AArch64/NEON. Other search options will benefit anywhere from 20% to 100% speedup on AArch64/NEON. Because the compiler's register allocation, instruction scheduling and alias analysis are improved, I expect these changes will also speed up searching with SSE2/AVX2. A quick test confirms this, with the same runs on Intel MacOS giving a 15% speed up and a 90% speed up when searching for the word the.

Now I have to find time to work on this. Stay tuned!

genivia-inc commented 3 months ago

OK, implemented and mostly tested over the weekend. Still some work to do. The executable is not larger, but faster. This update will be a lot faster on ARM devices that support NEON and AArch64.

All should be ready by next week to release 6.0.

genivia-inc commented 3 months ago

The ugrep 6.0 benchmarks are already posted: https://github.com/Genivia/ugrep-benchmarks

This shows that ugrep is (one of) the fastest grep. Please note that no grep can (and should) absolutely claim to be always the fastest, because there are different algorithms involved with pros and cons.

Ugrep 6.0 will be released soon!