Genivia / ugrep

NEW ugrep 6.5: a more powerful, ultra fast, user-friendly, compatible grep. Includes a TUI, Google-like Boolean search with AND/OR/NOT, fuzzy search, hexdumps, searches (nested) archives (zip, 7z, tar, pax, cpio), compressed files (gz, Z, bz2, lzma, xz, lz4, zstd, brotli), pdfs, docs, and more
https://ugrep.com
BSD 3-Clause "New" or "Revised" License
2.56k stars 109 forks source link

Fix avx512bw configure test #386

Closed hwti closed 3 months ago

hwti commented 3 months ago

Unlike clang, gcc fails when using m512 instead of m512i.

hwti commented 3 months ago

Since the avx512 code was never enabled with gcc, does it need tests to make sure it's actually faster than avx2 ? Obviously it might depend on the the machine, I can test on Tiger Lake (i7-11850H, mobile CPU so not optimal for reproducible benchmarks).

genivia-inc commented 3 months ago

Thank you for your feedback and for the patch.

The avx512bw is claimed to be faster by this article: http://0x80.pl/articles/simd-friendly-karp-rabin.html

It looked worthwhile to use. In fact, this is the only piece of code I borrowed from a legitimate source. Everything else in ugrep is written "in house" from scratch, including the SIMD optimizations, tar/zip/pax and decompression code. Some parts I wrote some time ago for other open source projects, such as RE/flex and some other.

hwti commented 3 months ago

The avx512bw is claimed to be faster by this article: http://0x80.pl/articles/simd-friendly-karp-rabin.html

I was thinking about the lower clocks, and clock / voltage transitions which stall or slow down the CPU (see for example https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html, even if newer CPUs are probably less affected).