Is it possible to use formatted output with -ABC context?

Genivia / ugrep

NEW ugrep 6.5: a more powerful, ultra fast, user-friendly, compatible grep. Includes a TUI, Google-like Boolean search with AND/OR/NOT, fuzzy search, hexdumps, searches (nested) archives (zip, 7z, tar, pax, cpio), compressed files (gz, Z, bz2, lzma, xz, lz4, zstd, brotli), pdfs, docs, and more

https://ugrep.com

BSD 3-Clause "New" or "Revised" License

2.57k stars 109 forks source link

Is it possible to use formatted output with -ABC context? #309

Closed acelticsfan closed 6 months ago

acelticsfan commented 10 months ago

I prefer to use ugrep with the %u output format option enabled so that I only get unique lines no matter how many matches per line like other grep utilities.

But, format options are not allowed when using context options such as -A, -B, and -C.

Would it be difficult to add a CLI option to enable this functionality for output when using context options ?

genivia-inc commented 10 months ago

The %u format field is a switch that when used in a format string forces output of one line with all matches, like grep normally does, unless option -u is used to ungroup matches from lines.

The context options -ABC are not possible to be used with formatted output in the current ugrep versions. Perhaps I can add that later, although my impression is that -ABC is typically used to display output for humans to interpret. Formatted output is meant for "machines" to process. But that's just my impression.

acelticsfan commented 10 months ago

I do use both formatted and contextual output for human consumption since I use the %u format option. I'm using ugrep to review system logs and the amount of matches and lines with duplicate output that I get with the context options (-ABC) is a bit overwhelming.

Thanks for the consideration on this.

genivia-inc commented 10 months ago

I will put this on the TODO list. If I understand correctly, then you want formatted output to include -ABC line context?

I am not sure what design would be best for this, because --format defines a single string to output matches, not context.

If we want to support -ABC context, then new --format-context-before and --format-context-after options will need to be added so that the context can have its own formatting string to distinguish context from matches. I'm concerned that the context before and after may need to be distinguishable, otherwise a single --format-context option could suffice.

I don't think a new format %-field could suffice to capture and output context, since that will be tricky to use and clutter the format strings that we currently have e.g. for CSV, JSON and XML.

acelticsfan commented 10 months ago

You could just make it work for non-formatted if you want.

It would be nice to have it work the same across the board, but I can use it either way.

Whatever you have the time or inclination to implement would be great, including whether to make it a CLI switch or format option.

Thanks!

acelticsfan commented 4 months ago

I can't seem to make this work. Is there something I'm missing?

This is how I'm running it. Output looks the same as before. ugrep --sort=changed -sIzaPU --zmax=2 --format-open='%u%p/%a:%z%~' --format='%u%O%~' --format-close='%~' -C5 <string> <file.tgz>

genivia-inc commented 4 months ago

Context options aren't supported with --format until now at least. While --format can and does pretty much everything you can do without it, but with your own output format and other bells and whistles, context is not one of these things. See comments above, which says I don't know what way context could be represented with either other format options(s) or with fields or just "as is". I don't think it should be "as is", which gives no control at all.

acelticsfan commented 4 months ago

Okay, got it. So, there were no changes made with respect to this ticket due to ambiguity of spec or implementation.

genivia-inc commented 4 months ago

Okay, got it. So, there were no changes made with respect to this ticket due to ambiguity of spec or implementation.

Yep. But it's not a "full stop no". I've added new features in the past that I thought would be cause unwanted complications given the design requirements*), but I've added them when I found the best way to approach it with an algorithm and/or implementation. So I could revisit this sometime.

(*) For example, design requirements impose no limit on file sizes to search, which means that we don't want to store files in memory to search but rather search them in a window. (OK, that has nothing to do with this issue, but you get the idea).