facebook / zstd

Zstandard - Fast real-time compression algorithm
http://www.zstd.net
Other
23.16k stars 2.06k forks source link

zstdgrep parsing of grep options naively broken #2064

Open trantor opened 4 years ago

trantor commented 4 years ago

Describe the bug zstdgrep as a wrapper for zstdcat | grep should handle grep options properly. It doesn't in several cases. A few examples below.

To Reproduce

# The fact that the -e grep option can be specified multiple times
# for multiple pattern is not handled at all, nor is the use of equivalent
# long options with or without = between them and their arguments

$ echo start > /tmp/file
$ echo stop >> /tmp/file
$ grep -e start -e stop /tmp/file
start
stop
$ grep --regexp=start --regexp=stop /tmp/file
start
stop
$ grep --regexp start --regexp stop /tmp/file
start
stop
$ grep --regexp start --regexp stop -e start /tmp/file
start
stop
$ grep --regexp start --regexp=stop -e start /tmp/file
start
stop
$ zstdgrep --regexp=start --regexp=stop /tmp/file
/tmp/file:start
/tmp/file:stop
Incorrect parameters
$ zstdgrep --regexp start --regexp stop /tmp/file
zstd: can't stat --regexp : No such file or directory -- ignored 
grep: start: No such file or directory
zstd: can't stat stop : No such file or directory -- ignored 
grep: start: No such file or directory
grep: start: No such file or directory
$ zstdgrep -e start -e stop /tmp/file
zstd: can't stat -e : No such file or directory -- ignored 
zstd: can't stat stop : No such file or directory -- ignored 
/tmp/file:start

# --file= and -f with different behaviours for
# -, /dev/fd/0, /dev/stdin as pattern files 
# also regarding when to print the name of the examined file

$ echo start | grep -f - /tmp/file
start
$ echo start | zstdgrep -f - /tmp/file
$ echo start | grep --file=- /tmp/file
start
$ echo start | zstdgrep --file=- /tmp/file
/tmp/file:start

# --file= and -f with different behaviours
# Process substitutions with bash not working as with grep on its own

$ cp  /tmp/file /tmp/file2
$ grep --file=<( echo start ) /tmp/file /tmp/file2
/tmp/file:start
/tmp/file2:start
$ grep -f <( echo start ) /tmp/file /tmp/file2
/tmp/file:start
/tmp/file2:start
$ zstdgrep --file=<( echo start ) /tmp/file /tmp/file2
/tmp/file:start
/tmp/file2:start
$ zstdgrep -f <( echo start ) /tmp/file /tmp/file2
/tmp/file:start

Expected behavior For the behavious of the wrapper to mimic precisely what grep would do on its own.

Additional context Zstd version 1.4.4

bimbashrestha commented 4 years ago

All the inconsistencies listed except a few of the bash style commands at the end (which zgrep doesn't support either) have been fixed. I'm going to close this issue.

trantor commented 4 years ago

Not to be pedantic, but isn't the fact that the patch appears to be lifted mostly if not entirely from zgrep, which is under GPLv3 if I look correctly, a licensing problem? It was licensed as GPLv2 till 2007 but I wonder...

bimbashrestha commented 4 years ago

@trantor Thanks for the comment.

Yes the file is heavily inspired off of zgrep. The potential licensing issue is a good point. The hope was that the changes would be different enough from the original implementation to not infringe but I guess it's noticeably similar. It's hard to reimplement the same parsing logic in a substantially different way and have it still be correct, comprehensive, portable, etc (hard for me at least as a non-shell expert :/)

I'm not sure what a good alternative here is. Maybe it would be worth it to implement zstdgrep as a c program? Might be a non-trivial project but at least, we'd be able to close the door on any potential licensing issues. @Cyan4973?

bimbashrestha commented 4 years ago

Reverted the change. We've decided to keep the bug fixes for zstdgrep out of 1.4.5. We'll have to come back and fix this for 1.4.6 with preferably our own implementation of zstdgrep.

trantor commented 4 years ago

Well if the fragment of code is identical (or nearly so) to how it was in zgrep before it was licensed as GPLv3 back in 2007 and therefore still as GPLv2 it should not be a problem, I guess. Not sure though. Quite a number of fixes since then though

phraktle commented 2 months ago

I would suggest removing zstdgrep entirely from the package, unless it's actually passing params to grep, as being quietly broken is more dangerous than the inconvenience of not having this utility. (I for one used grep -e foo -e bar * and assumed it works correctly, but turns out bar was not being matched, only foo).