Genivia / ugrep

NEW ugrep 6.2: a more powerful, ultra fast, user-friendly, compatible grep. Includes a TUI, Google-like Boolean search with AND/OR/NOT, fuzzy search, hexdumps, searches (nested) archives (zip, 7z, tar, pax, cpio), compressed files (gz, Z, bz2, lzma, xz, lz4, zstd, brotli), pdfs, docs, and more
https://ugrep.com
BSD 3-Clause "New" or "Revised" License
2.52k stars 106 forks source link

Faster ugrep 4.1 performance report (preview - not yet released) #289

Closed genivia-inc closed 10 months ago

genivia-inc commented 10 months ago

Ugrep 4.1 is most likely (pretty much always...?) faster than other fast grep tools, when #284, #287, #288 are applied and a few additional performance tweaks are made, such as one worker thread less to better balance the workers in recursive searches Note that -ABC is not as fast yet (unoptimized, see notes below) and leading wildcard patterns such as \w+foo are not optimized at all yet, see #288.

In addition to the OpenSSL 3.1.2 source code repo to compare recursive search speeds, a larger source code repo was added to the updated benchmark tests (swift-swift-5.8.1-RELEASE).

Since I am on a break, I don't have the means and time to release an update until about 3 weeks. Then you can try this yourself by downloading and running the ugrep-benchmarks scripts.

performance reports

Updated benchmarks are automatically generated and published when a new version of ugrep is released Last updated: 2023-08-29

performance report x64

performance report arm64

Intel machine:

./install.sh # expand source code repo in corpi dir and create archives to search
./bench.sh > report_x64.md
./collect.awk < report_x64.md

ARM64 machine:

./install.sh # expand source code repo in corpi dir and create archives to search
./bench.sh > report_arm64.md
./collect.awk < report_arm64.md

the install.sh script requires the following compression utilities:

WARNING performance results are meaningless when the host machine executes other tasks that load the CPU; quit all running applications first and check for running background processes (with e.g. top) before running ./bench.sh

important notes:

performance report x64

found ugrep 1172944 byte executable located at /usr/local/bin/ugrep

ugrep 4.1.0 x86_64-apple-darwin19.6.0 +avx2 +pcre2jit +zlib +bzip2 +lzma +lz4 +zstd
License BSD-3-Clause: <https://opensource.org/licenses/BSD-3-Clause>
Written by Robert van Engelen and others: <https://github.com/Genivia/ugrep>

found rg 6075312 byte executable located at /opt/local/bin/rg

ripgrep 13.0.0
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

found ag 84764 byte executable located at /usr/local/bin/ag

ag version 2.2.0

Features:
  +jit +lzma +zlib

found ggrep 263184 byte executable located at /usr/local/bin/ggrep

ggrep (GNU grep) 3.11
Packaged by Homebrew
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others; see
<https://git.savannah.gnu.org/cgit/grep.git/tree/AUTHORS>.

grep -P uses PCRE2 10.42 2022-12-11

results for large text file search

grepping rol elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.03 0.03 0.02 0.03 0.03 0.03 0.03 0.01
rg 0.03 0.03 0.04 0.06 0.06 0.06 0.06 0.02
ag 0.67 0.66 0.40 0.35 0.35 0.35 0.11 0.11
ggrep 0.11 0.13 0.14 0.15 0.54 0.53 0.52 0.07

grepping the elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.08 0.09 0.16 0.09 0.11 0.20 0.07 0.00
rg 0.07 0.11 0.29 0.20 0.22 1.54 0.13 0.01
ag 3.70 3.66 1.06 3.57 3.58 1.00 0.15 0.15
ggrep 0.13 0.17 0.70 0.34 0.93 3.99 0.77 0.00

grepping cycles|semigroups elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.04 0.03 0.03 0.03 0.03 0.04 0.04 0.00
rg 0.04 0.03 0.03 0.26 0.06 0.07 0.07 0.01
ag 0.42 0.42 0.40 0.43 0.42 0.41 0.17 0.17
ggrep 0.21 0.22 0.22 0.33 0.29 0.30 0.28 0.00

grepping ab(cd?)? elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.04 0.05 0.05 0.03 0.03 0.03 0.03 0.00
rg 0.09 0.11 0.14 0.10 0.12 0.12 0.11 0.01
ag 1.85 1.85 0.62 0.44 0.44 0.43 0.18 0.18
ggrep 0.11 0.13 0.35 0.37 1.70 1.73 1.72 0.00

grepping ro[a-z]*ds elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.06 0.06 0.06 0.06 0.06 0.05 0.06 0.00
rg 0.07 0.07 0.07 0.17 0.25 0.24 0.25 0.01
ag 0.44 0.46 0.40 0.41 0.41 0.39 0.15 0.15
ggrep 0.36 0.38 0.38 0.41 0.84 0.86 0.83 0.00

grepping (19|20)[0-9]{2}/(0[1-9]|1[012])|(0[1-9]|1[012])/(19|20)[0-9]{2} elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.00
rg 0.06 0.06 0.06 0.16 0.15 0.16 0.15 0.01
ag 0.41 0.41 0.40 0.41 0.41 0.39 0.14 0.14
ggrep 0.05 0.07 0.07 0.09 0.10 0.11 0.08 0.00

grepping (https?://|www\.)[-a-zA-Z0-9@:%._+~#=]{1,253}\.[-a-zA-Z0-9]{2,}\.[][a-zA-Z0-9()@:%_+.~#?&/=\-]+ elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.07 0.07 0.06 0.08 0.08 0.08 0.08 0.02
rg fail fail fail fail fail fail fail fail
ag 0.80 0.79 0.55 0.80 0.80 0.55 0.17 0.17
ggrep 5.71 5.79 11.65 5.89 6.04 12.31 5.95 0.01

grepping ^={2,4}[^=].* elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.06 0.07 0.07 0.06 0.06 0.06 0.06 0.01
rg 0.05 0.06 0.07 0.14 0.13 3.42 0.12 0.01
ag 0.40 0.40 0.39 fail fail fail fail fail
ggrep 0.12 0.14 0.25 0.14 0.15 0.27 0.12 0.00

grepping '' elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.13 0.17 0.18 0.17 0.17 0.17 0.08 0.00
rg 0.14 0.34 20.84 1.39 1.39 fail 0.97 0.01
ag fail fail fail fail fail fail fail 1.18
ggrep 0.20 0.34 17.20 1.57 3.43 47.25 3.03 0.00

grepping ^$ elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.05 0.06 0.06 0.06 0.07 0.06 0.05 0.00
rg 0.20 0.24 0.26 0.37 0.37 0.78 0.31 0.01
ag 0.40 0.40 0.38 fail fail fail fail fail
ggrep 0.10 0.14 0.15 2.86 4.66 4.76 4.66 0.00

results for large text file search for words from files

grepping -fwords/1.txt elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.06 0.06 0.06 0.06 0.12 0.24 0.08 0.00
rg 0.06 0.07 0.09 0.17 0.28 2.66 0.20 0.01
ggrep 0.14 0.16 0.21 0.17 0.98 4.28 0.85 0.00

grepping -fwords/2.txt elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.25 0.26 0.25 0.25 0.28 0.27 0.27 0.00
rg 0.14 0.33 18.81 0.40 0.35 0.34 0.35 0.02
ggrep 0.95 0.96 0.96 0.96 0.43 0.49 0.41 0.00

grepping -fwords/3.txt elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.27 0.27 0.28 0.27 0.40 0.40 0.42 0.00
rg 1.96 1.97 2.36 0.38 0.43 0.41 0.40 0.04
ggrep 1.38 1.41 1.64 1.49 9.80 10.08 9.63 0.02

grepping -fwords/4.txt elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.24 0.24 0.24 0.26 0.47 0.48 0.46 0.02
rg 0.35 0.35 0.35 1.57 1.90 2.05 1.83 0.17
ggrep fail fail fail fail fail fail fail fail

results for large text file search with formatted output

grepping Sherlock|Holmes elapsed real time (s)

search --json --csv --xml --hex
ugrep 0.03 0.03 0.02 0.03
rg 0.04 fail fail fail
ag 0.40 fail 0.40 fail

results for large text file search with replaced output

grepping flop elapsed real time (s)

search --replace=flip
ugrep 0.02
rg 0.03

results for large text file search with context

grepping ^$ elapsed real time (s)

search -A1 -B1 -C1 -winA1 -winB1 -winC1
ugrep 0.19 0.23 0.22 0.20 0.24 0.24
rg 0.26 0.25 0.26 0.47 0.49 0.49
ag 0.49 0.72 0.73 fail fail fail
ggrep 0.15 0.18 0.17 4.91 4.96 4.95

grepping Sherlock|Holmes elapsed real time (s)

search -A1 -B1 -C1 -winA1 -winB1 -winC1
ugrep 0.13 0.17 0.17 0.14 0.18 0.17
rg 0.03 0.04 0.04 0.09 0.09 0.09
ag 0.39 0.57 0.59 0.42 0.61 0.60
ggrep 0.13 0.13 0.13 0.29 0.30 0.29

results for OpenSSL source code repo directory search

grepping FIXME|TODO elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.04 0.04 0.04 0.04 0.04 0.04
rg 0.03 0.04 0.04 0.04 0.04 0.05
ag 0.05 0.06 0.05 0.06 0.05 0.05
ggrep 0.15 0.16 0.23 0.23 0.22 0.21

grepping char|int|long|size_t|void elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.05 0.05 0.05 0.05 0.05 0.04
rg 0.05 0.07 0.08 0.15 0.07 0.06
ag 0.50 0.34 0.35 0.23 0.08 0.08
ggrep 0.24 0.32 0.53 0.73 0.48 0.19

grepping ssl-?3(\.[0-9]+)? elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.04 0.04 0.04 0.04 0.04 0.04
rg 0.03 0.04 0.07 0.07 0.07 0.07
ag 0.06 0.05 0.05 0.07 0.05 0.05
ggrep 0.13 0.13 0.16 0.15 0.15 0.14

results for Swift source code repo directory search

grepping _(RUN|LIB|NAM)[A-Z_]+ elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.18 0.18 0.18 0.18 0.20 0.16
rg 0.18 0.19 0.20 0.20 0.20 0.19
ag 0.38 0.49 0.39 0.37 0.40 0.37
ggrep 0.62 0.77 0.88 0.87 0.84 0.79

grepping String|Int|Double|Array|Dictionary elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.23 0.20 0.23 0.23 0.21 0.17
rg 0.20 0.22 0.28 0.36 0.26 0.20
ag 1.35 0.75 1.02 0.77 0.48 0.55
ggrep 0.86 1.02 2.63 3.11 2.53 0.97

grepping (class|struct)\sS[a-z]+T elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.18 0.18 0.18 0.18 0.20 0.17
rg 0.18 0.28 0.38 0.38 0.37 0.37
ag 0.42 0.38 0.47 0.50 0.38 0.38
ggrep 0.81 0.88 1.20 1.20 1.16 1.09

grepping for\s[a-z]+\sin elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.20 0.18 0.18 0.18 0.20 0.17
rg 0.18 0.18 0.29 0.29 0.29 0.28
ag 0.48 0.43 0.44 0.50 0.41 0.39
ggrep 0.75 0.74 1.04 1.06 1.03 0.89

results for bz2 compressed large text file search

grepping landsnail elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 3.23 3.25 3.24 3.23 0.47 0.46
rg 3.37 3.36 3.38 3.36 0.47 0.49
ag fail fail fail fail fail fail

results for gz compressed large text file search

grepping landsnail elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 0.49 0.49 0.49 0.49 0.07 0.07
rg 0.40 0.41 0.41 0.42 0.06 0.07
ag fail fail fail fail fail fail

results for lz4 compressed large text file search

grepping landsnail elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 0.10 0.10 0.10 0.10 0.02 0.02
rg 0.12 0.16 0.12 0.16 0.03 0.05
ag fail fail fail fail fail fail

results for xz compressed large text file search

grepping landsnail elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 1.44 1.56 1.56 1.46 0.21 0.21
rg 1.64 1.62 1.61 1.56 0.23 0.23
ag fail fail fail fail fail fail

results for zstd compressed large text file search

grepping landsnail elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 0.19 0.19 0.19 0.19 0.03 0.03
rg 0.18 0.18 0.16 0.16 0.03 0.04
ag fail fail fail fail fail fail

results for zip archived repo search

grepping FIXME|TODO elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 0.30 0.30 0.29 0.29 0.29 0.29
rg fail fail fail fail fail fail
ag fail fail fail fail fail fail

results for tar archived repo search

grepping FIXME|TODO elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 0.15 0.15 0.14 0.14 0.14 0.14
rg fail fail fail fail fail fail
ag fail fail fail fail fail fail

results for compressed tarball search

grepping FIXME|TODO elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 0.37 0.36 0.35 0.35 0.35 0.36
rg fail fail fail fail fail fail
ag fail fail fail fail fail fail

performance report arm64

found ugrep 1091554 byte executable located at /usr/local/bin/ugrep

ugrep 4.1.0 arm-apple-darwin21.6.0 +neon/AArch64 +pcre2jit +zlib +bzip2 +lzma +lz4 +zstd
License BSD-3-Clause: <https://opensource.org/licenses/BSD-3-Clause>
Written by Robert van Engelen and others: <https://github.com/Genivia/ugrep>

found rg 5571088 byte executable located at /opt/local/bin/rg

ripgrep 13.0.0
-SIMD -AVX (compiled)

found ag 111344 byte executable located at /opt/homebrew/bin/ag

ag version 2.2.0

Features:
  +jit +lzma +zlib

found ggrep 266352 byte executable located at /opt/homebrew/bin/ggrep

ggrep (GNU grep) 3.11
Packaged by Homebrew
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others; see
<https://git.savannah.gnu.org/cgit/grep.git/tree/AUTHORS>.

grep -P uses PCRE2 10.42 2022-12-11

results for large text file search

grepping rol elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.03 0.03 0.03 0.02 0.02 0.02 0.02 0.00
rg 0.08 0.08 0.09 0.10 0.14 0.14 0.14 0.02
ag 0.54 0.54 0.43 0.40 0.41 0.41 0.16 0.17
ggrep 0.10 0.12 0.13 0.13 0.38 0.38 0.36 0.05

grepping the elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.06 0.06 0.11 0.07 0.07 0.13 0.05 0.00
rg 0.06 0.08 0.21 0.14 0.18 1.22 0.14 0.00
ag 1.94 1.94 0.84 1.89 1.89 0.81 0.21 0.21
ggrep 0.10 0.13 0.40 0.23 0.52 2.32 0.47 0.00

grepping cycles|semigroups elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.03 0.03 0.03 0.03 0.04 0.04 0.03 0.00
rg 0.20 0.20 0.20 0.25 0.22 0.22 0.21 0.01
ag 0.52 0.52 0.52 0.51 0.50 0.50 0.27 0.27
ggrep 0.15 0.17 0.17 0.26 0.24 0.24 0.22 0.00

grepping ab(cd?)? elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.04 0.04 0.04 0.03 0.03 0.03 0.02 0.00
rg 0.13 0.14 0.18 0.12 0.22 0.23 0.22 0.00
ag 1.08 1.09 0.61 0.49 0.48 0.49 0.23 0.21
ggrep 0.08 0.11 0.21 0.23 1.19 1.18 1.09 0.00

grepping ro[a-z]*ds elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.00
rg 0.16 0.16 0.16 0.22 0.27 0.27 0.26 0.00
ag 0.44 0.44 0.42 0.41 0.41 0.40 0.17 0.16
ggrep 0.25 0.27 0.28 0.30 0.60 0.60 0.58 0.00

grepping (19|20)[0-9]{2}/(0[1-9]|1[012])|(0[1-9]|1[012])/(19|20)[0-9]{2} elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.00
rg 0.05 0.06 0.06 0.14 0.14 0.14 0.14 0.00
ag 0.38 0.38 0.37 0.36 0.36 0.36 0.12 0.12
ggrep 0.04 0.05 0.06 0.07 0.08 0.08 0.06 0.00

grepping (https?://|www\.)[-a-zA-Z0-9@:%._+~#=]{1,253}\.[-a-zA-Z0-9]{2,}\.[][a-zA-Z0-9()@:%_+.~#?&/=\-]+ elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.05 0.05 0.05 0.05 0.06 0.06 0.05 0.00
rg fail fail fail fail fail fail fail fail
ag 0.60 0.60 0.50 0.59 0.61 0.50 0.19 0.19
ggrep 3.41 3.42 6.95 3.53 3.68 7.45 3.65 0.00

grepping ^={2,4}[^=].* elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.04 0.04 0.04 0.05 0.04 0.04 0.04 0.00
rg 0.04 0.05 0.06 0.09 0.09 2.89 0.08 0.00
ag 0.42 0.42 0.41 fail fail fail fail fail
ggrep 0.05 0.08 0.11 0.08 0.08 0.12 0.05 0.00

grepping '' elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.08 0.10 0.10 0.10 0.10 0.10 0.05 0.00
rg 0.09 0.18 8.67 0.88 0.88 fail 0.75 0.00
ag fail fail fail fail fail fail fail 2.03
ggrep 0.11 0.19 7.44 0.70 1.20 15.82 1.03 0.00

grepping ^$ elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.04 0.05 0.04 0.05 0.05 0.04 0.04 0.00
rg 0.19 0.21 0.22 0.29 0.29 0.62 0.27 0.00
ag 0.41 0.41 0.40 fail fail fail fail fail
ggrep 0.06 0.09 0.09 1.54 2.98 2.99 2.95 0.00

results for large text file search for words from files

grepping -fwords/1.txt elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.05 0.05 0.05 0.05 0.10 0.21 0.07 0.00
rg 0.16 0.18 0.25 0.23 0.21 2.15 0.17 0.00
ggrep 0.08 0.10 0.13 0.11 0.55 2.44 0.49 0.00

grepping -fwords/2.txt elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.13 0.13 0.13 0.13 0.15 0.15 0.15 0.00
rg 0.09 0.17 7.67 0.31 0.27 0.27 0.27 0.01
ggrep 0.63 0.65 0.66 0.65 0.32 0.36 0.30 0.00

grepping -fwords/3.txt elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.14 0.14 0.15 0.14 0.23 0.24 0.23 0.00
rg 0.17 0.17 0.20 0.29 0.30 0.31 0.30 0.01
ggrep 0.88 0.91 1.07 1.00 6.29 6.52 6.28 0.01

grepping -fwords/4.txt elapsed real time (s)

search -n -no -wn -win -wino -cwi -lwi
ugrep 0.19 0.19 0.19 0.19 0.32 0.34 0.31 0.01
rg 0.20 0.21 0.22 0.84 1.05 1.14 1.05 0.06
ggrep fail fail fail fail fail fail fail fail

results for large text file search with formatted output

grepping Sherlock|Holmes elapsed real time (s)

search --json --csv --xml --hex
ugrep 0.02 0.02 0.02 0.02
rg 0.03 fail fail fail
ag 0.34 fail 0.33 fail

results for large text file search with replaced output

grepping flop elapsed real time (s)

search --replace=flip
ugrep 0.02
rg 0.04

results for large text file search with context

grepping ^$ elapsed real time (s)

search -A1 -B1 -C1 -winA1 -winB1 -winC1
ugrep 0.10 0.12 0.12 0.10 0.13 0.13
rg 0.22 0.23 0.23 0.33 0.34 0.35
ag 0.45 0.53 0.54 fail fail fail
ggrep 0.09 0.12 0.11 3.07 3.10 3.05

grepping Sherlock|Holmes elapsed real time (s)

search -A1 -B1 -C1 -winA1 -winB1 -winC1
ugrep 0.06 0.09 0.09 0.07 0.10 0.10
rg 0.03 0.03 0.03 0.23 0.23 0.23
ag 0.33 0.41 0.41 0.50 0.59 0.59
ggrep 0.09 0.09 0.09 0.23 0.23 0.23

results for OpenSSL source code repo directory search

grepping FIXME|TODO elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.03 0.03 0.03 0.03 0.04 0.04
rg 0.05 0.05 0.04 0.04 0.04 0.05
ag 0.04 0.04 0.04 0.04 0.04 0.04
ggrep 0.11 0.12 0.17 0.17 0.16 0.15

grepping char|int|long|size_t|void elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.04 0.04 0.04 0.04 0.04 0.04
rg 0.04 0.04 0.05 0.09 0.04 0.04
ag 0.32 0.23 0.23 0.18 0.05 0.05
ggrep 0.16 0.21 0.31 0.43 0.28 0.13

grepping ssl-?3(\.[0-9]+)? elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.04 0.04 0.03 0.03 0.04 0.04
rg 0.06 0.05 0.05 0.05 0.06 0.05
ag 0.04 0.04 0.04 0.04 0.04 0.04
ggrep 0.09 0.10 0.11 0.11 0.09 0.09

results for Swift source code repo directory search

grepping _(RUN|LIB|NAM)[A-Z_]+ elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.20 0.18 0.21 0.20 0.20 0.20
rg 0.25 0.21 0.24 0.20 0.29 0.22
ag 0.24 0.22 0.21 0.21 0.24 0.25
ggrep 0.43 0.54 0.55 0.56 0.50 0.49

grepping String|Int|Double|Array|Dictionary elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.23 0.19 0.21 0.24 0.21 0.22
rg 0.20 0.22 0.22 0.31 0.23 0.19
ag 0.86 0.49 0.68 0.58 0.25 0.25
ggrep 0.54 0.68 1.56 1.79 1.48 0.59

grepping (class|struct)\sS[a-z]+T elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.17 0.19 0.19 0.20 0.22 0.20
rg 0.28 0.21 0.22 0.27 0.26 0.22
ag 0.22 0.22 0.22 0.23 0.23 0.23
ggrep 0.56 0.64 0.83 0.82 0.76 0.73

grepping for\s[a-z]+\sin elapsed real time (s)

search -n -nr -wn -wnr -win -winr -wino -winor -cwi -cwir -lwi -lwir
ugrep 0.18 0.18 0.18 0.21 0.20 0.18
rg 0.23 0.23 0.22 0.22 0.22 0.24
ag 0.34 0.26 0.26 0.26 0.23 0.22
ggrep 0.52 0.53 0.72 0.71 0.66 0.59

results for bz2 compressed large text file search

grepping landsnail elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 1.97 1.98 1.97 1.97 0.28 0.28
rg 2.00 2.00 2.00 2.00 0.27 0.29
ag fail fail fail fail fail fail

results for gz compressed large text file search

grepping landsnail elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 0.30 0.30 0.30 0.30 0.04 0.04
rg 0.30 0.30 0.30 0.30 0.04 0.05
ag fail fail fail fail fail fail

results for lz4 compressed large text file search

grepping landsnail elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 0.06 0.06 0.06 0.06 0.01 0.01
rg 0.11 0.17 0.12 0.17 0.02 0.03
ag fail fail fail fail fail fail

results for xz compressed large text file search

grepping landsnail elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 1.09 1.09 1.08 1.10 0.16 0.16
rg 1.12 1.12 1.12 1.17 0.17 0.16
ag fail fail fail fail fail fail

results for zstd compressed large text file search

grepping landsnail elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 0.15 0.14 0.14 0.14 0.02 0.02
rg 0.12 0.13 0.12 0.13 fail fail
ag fail fail fail fail fail fail

results for zip archived repo search

grepping FIXME|TODO elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 0.21 0.21 0.20 0.20 0.19 0.19
rg fail fail fail fail fail fail
ag fail fail fail fail fail fail

results for tar archived repo search

grepping FIXME|TODO elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 0.10 0.09 0.09 0.10 0.08 0.07
rg fail fail fail fail fail fail
ag fail fail fail fail fail fail

results for compressed tarball search

grepping FIXME|TODO elapsed real time (s)

search -z -zwin -zc -zwic -zl -zwil
ugrep 0.23 0.23 0.22 0.21 0.21 0.21
rg fail fail fail fail fail fail
ag fail fail fail fail fail fail
GwynethLlewelyn commented 10 months ago

πŸ‘ Let me say, you continue to keep us impressed... sure, there might be one or two cases where ugrep, in some scenarios, is ever-so-slightly slower (one wonders what's going on with gz compression...), but in this 'game of benchmarks', you get extra points for consistency of overall performance under several very different scenarios.

I was particularly impressed by the speed of your regexp processing! Surely the pcre2jit library is clearly the winner here. One wonders how Boost.regexp would fare in this scenario? In any case, if pcre2jit is that good, why doesn't everybody use it? :-) (The results, after all, speak for themselves...)

genivia-inc commented 10 months ago

Thank you for reading my post and for commenting. I don't expect many will read all these details and find it all that interesting, except maybe those who love this field of work ;-)

I'm not 100% happy with the performance yet and have been thinking about it over 2 years. Since I started implementing new features that people wanted and I also wanted to add, I had put additional performance enhancements on the back burner until later. Some of these regex cases that are left to optimize are a bit "artificial", but nevertheless important enough to optimize as it should.

On your comment about PCRE2/Boost.Regex: note that ugrep's default regex engine is RE/flex, not PCRE2 jit. PCRE2 jit is only used with option -P for Perl-compatible regex. Boost.Regex is an option if PCRE2 is not installed. Boost.Regex is pretty good too, but there is not a means to perfectly implement searching streaming input since it doesn't fully support partial matching like PCRE2 does (I long time ago I discussed this with the Boost.Regex author, but the full partial matching requirements is still not completed). The default regex engine is purely DFA-based and pretty quick, see also the performance table for matching/scanning (not searching) on the RE/flex project page: https://github.com/Genivia/RE-flex

On your comment about gz: there is no significant difference in performance on the arm64, but the x64 gz-compressed search appears to be slower than expected. It might be the libz version difference or the decompression buffer size is suboptimal or some cache effect. I will look into that.

Edit: needed to clarify "artificial", I mean the cases left to optimize, not the cases benchmarked and added a note about gz.

GwynethLlewelyn commented 10 months ago

Oh... I admit that you lost me with some of the multitude of acronyms related to CPU microarchitecture :) I'm from the days when SSE2 was the new kid in the block :) β€” which means that I'm completely out of the loop and merely browsed very quickly across your post. Nevertheless, I can certainly appreciate the amount of testing data that you have accumulated. Obviously, I haven't reproduced your results :) and therefore take your word on it :)

But I think that's irrelevant; you can always skew benchmarks and statistics to show whatever you wish to show. What matters most here is the order of magnitude in many of the different tests β€” i.e. measuring 10x the speed difference will be perceptible, no matter what system you're running it on β€” while a 5-10% difference may just be too close to the overall error margin to be significative. I don't know β€” for a research paper, I suppose you'd be asked to provide much more than that (and, then again, I might be wrong on that assumption!).

While everybody loves extra featuresΒ β€” and often these make a difference when choosing one option over another! β€” in my case I'm much more curious on how far you can push all those optimisations, what kind of limits you'll reach, and if you'll have to start trading off memory vs. CPU (possibly switched via a CLI option...?) in order to squeeze a bit more performance out of it...

Thanks for the clarification about the regex engine you're using (it shows that I really didn't bother to read the source code, doesn't it? 😁 ). In fact, I had never heard of RE/flex before, and I wonder β€” is it your own invention? If so, your results are certainly so impressive that I cannot but wonder why it hasn't been universally adopted everywhere :) Faster than RE2? Ouchie β€” ok, so that is really fast (because, well, RE2 cheats in order to get good results β€” by limiting the kind of regexs you can construct, focusing on the essential only... which always surprises me because I'm 'expecting' a larger syntax). Google must hate you :) In fact, I was going to ask if you did test RE2, but from the RE/flex README, it's clear that you did, and found that RE/flex is even faster than that.

I'm curious why you need to point out that RE/flex is "purely DFA-based". Never having to implement a regex parser (gosh, I barely manage to get some expressions working on regex101.com...), I erroneously thought that all were DFAs (wasn't that what Larry used originally?)... well, possibly with the exception of RE2, where it's clear they did a lot of cheating in order to boost performance. I guess that β€” again β€” my ignorance in such matters is legendary :) And to think that I used to be a pretty reasonable student attending the compilation classes, and liked them a lot β€” as well as playing around with yacc and lex, before even Stallmann wrote bison and flex. Gosh, I guess that I'm just too old and too outdated; but I thank you for your thorough explanations, I certainly learned quite a lot today!

As for gz's performance difference... well, it's certainly intriguing and worth looking into. I always assumed that whoever did the library for x64 just cross-compiled it to arm64, and never bothered much with the details, leaving the optimisation to the compiler. But I now realise that, when talking about achieving maximum performance, it might require some real knowledge of the underlying architecture in order to squeeze every bit of performance at the machine code level... and that might be lacking on the x64 version (while the arm64 version might take into account certain characteristics of the ARM processor and therefore work much faster).

genivia-inc commented 10 months ago

The latest ugrep v4.1 benchmark is now online: https://github.com/Genivia/ugrep-benchmarks

@GwynethLlewelyn I can certainly appreciate the amount of testing data that you have accumulated. Obviously, I haven't reproduced your results :) and therefore take your word on it :)

Thank you for your feedback and comments. I cannot possibly test on every possible contemporary computer architecture, so I picked MacBook Pro x64 with AVX2 and LPDDR4 and M1 Pro 10 core AArch64 with LPDDR5 as representative for common popular x64 and ARM64 machines. This will give a fair performance comparison of grep tools. Rest assured, these numbers are not skewed and are reported "as is". Others should be able to reproduce these on similar machines within a reasonably small error margin.

But I think that's irrelevant; you can always skew benchmarks and statistics to show whatever you wish to show. What matters most here is the order of magnitude in many of the different tests β€” i.e. measuring 10x the speed difference will be perceptible, no matter what system you're running it on β€” while a 5-10% difference may just be too close to the overall error margin to be significative. I don't know β€” for a research paper, I suppose you'd be asked to provide much more than that (and, then again, I might be wrong on that assumption!).

I agree that 10% faster or slower is not significant and usually falls well within the margin of system variance anyway. I didn't pick specific patterns for which ugrep specifically runs faster. For the time being I did avoid patterns similar to \w+foo that I know are slow because of backtracking, see #288 for work in progress.

Benchmarking grep tools can never be exhaustive. The space is just too large to even contemplate that. There will always be some regex patterns and use cases for which one tool is faster than another due to the differences in matching algorithms and because heuristics are used. Recursive searches with threads may also differ due to cashing effects and the chance when a thread picks up a job and completes it to report results. Load balancing is tricky, which I've done with weighted round-robin using "job stealing" strategy when worker threads become idle. Nevertheless, perfect load balancing is an NP-complete problem, so heuristics must be used, which have their own peculiarities.