Genivia / ugrep

NEW ugrep 6.5: a more powerful, ultra fast, user-friendly, compatible grep. Includes a TUI, Google-like Boolean search with AND/OR/NOT, fuzzy search, hexdumps, searches (nested) archives (zip, 7z, tar, pax, cpio), compressed files (gz, Z, bz2, lzma, xz, lz4, zstd, brotli), pdfs, docs, and more
https://ugrep.com
BSD 3-Clause "New" or "Revised" License
2.56k stars 109 forks source link

Escaped space causes a regex syntax error #360

Closed NightMachinery closed 5 months ago

NightMachinery commented 6 months ago
❯ echo 'hi h ' | ugrep --bool --perl-regexp -e 'h\ '
ugrep: error: error at position 6
(?m)h\ 
      \___invalid escape

Note that PCRE escapes spaces:

❯ echo -n 'h ' | perl -pe '$_=quotemeta'
h\

This is even needed when using --bool.

genivia-inc commented 6 months ago

I could change that in ugrep, but [ ] suffices to escape a space.

Escapes are typically reserved for meta characters. Escapes for regular characters are often undefined, so it is not surprising that GNU grep also does not accept the escaped space as valid BRE and ERE:

$ ggrep 'foo\ bar' some.txt
ggrep: warning: stray \ before white space
$ ggrep -E 'foo\ bar' some.txt
ggrep: warning: stray \ before white space

So this is not a bug.

By the way, in boolean mode --bool you can quote patterns to match h (with space) literally:

ugrep --bool --perl-regexp -e '"h "'

Or use \h which is a space or a tab.

NightMachinery commented 6 months ago

@genivia-inc I'd appreciate it if you could change it. I like to escape my patterns using perl (I have it automated with a hotkey in my editor), and I find \ more intuitive than [ ] (and easier to type). --perl-regexp should be compatible with perl IMO, not ggrep.

On double quotes in --bool; \" escapes them, right?

genivia-inc commented 6 months ago

Yes, for boolean mode the escape for space makes sense, because space has a meaning there. Also " has a meaning there, so \" is supported.

Not my personal opinion what is best, but I am just saying that in general you may not want to use \ to escape a space. It works with Perl regex, but pretty much for nothing else. It can also lead to problems when a \ is specified by accident. That is why GNU grep gives an "stray \" error. The standards are more strict about this, on purpose.

genivia-inc commented 6 months ago

It is great to have feedback like this.

I'll update to accept \-escaping for space and tab in a future release. The spacing should make it clear that the \ is not a stray escape as GNU grep complains about.

acelticsfan commented 5 months ago

Note that search string with "\s" also works like "[ ]" when using --bool option, and avoids the need for an escaped space.