arielf / cuts

Unix 'cut' (and 'paste') on steroids: more flexible select columns from files
Artistic License 2.0
66 stars 7 forks source link

How to fetch all cols except first? #5

Open unphased opened 9 years ago

unphased commented 9 years ago

Sorry if I am missing something really basic. But here's the use-case. Right now I really want to diff two files and my diff is showing me something weird, so I wanted to diff their hexdumps.

Initially diff <(xxd file1) <(xxd file2) was a good start. However the problem here was the offset column. My diff is actually a special diff that can handle newlines (it is character-wise diff), so all I am missing now is to just remove the first column of xxd's output. I thought, hey that new cuts util I grabbed is perfect for the job!

I couldn't come up with a good way to do this at first.

xxd index.php | cuts -d':' -1

Doesn't work; the file contains colons inside it so some lines are missing a bunch of data due to the last "column" being problematic because it can have all manner of ASCII content in it.

xxd index.php | cuts 1--1

I was hoping it would detect space as the delimiter (which it does!!) and give me the columns ranging from 1 thru minus 1 (which means cols 1 thru the last, which means just remove the first column). This does not work and produces what appears to be the output representing column 1, then column 0, then column -1. This is almost certainly a bug.

Then I tried this:

xxd index.php | cuts 1-99

This is an attempt to hack the last thing I tried, and it comes closer. However, the columns are re-printed separated by what appear to be tabs. Two points to make here. (1) I think it would make the most sense to exactly replicate the delimiter on output. I shouldn't have to use -D. (2) -D adds delimiters for the columns specified. Since I used 99 columns there are now exactly 98 spaces in the output... This is just a consequence of my hack.

I basically had to give up on the column selection syntax and went for this solution using regex:

xxd index.php | cuts -d'^[a-f0-9]+:' 1

The conclusion is that cuts is remarkably powerful primarily because of the regex capability. However for less brain power consumption, I think the column index specification syntax can be improved.

I understand that even though the scope of this utility is rather narrow on paper, it embeds a whole crap load of complexity in the edge-cases. So take your time to get this whole thing sorted... Thanks for implementing the regexes, though, it's plenty useful to have that and the built-in paste capability.

arielf commented 9 years ago

Thanks for the input.

Positive to negative ranges issue is a known issue, see: https://github.com/arielf/cuts/issues/3

Glad that you found a way in the end to achieve what you needed thanks to regexps (pretty clever, I have to say)