BurntSushi / xsv

A fast CSV command line toolkit written in Rust.
The Unlicense
10.4k stars 324 forks source link

Feature request: Diff mode #210

Open turion opened 4 years ago

turion commented 4 years ago

I frequently have to compare large csv files where only a few fields in moderately many rows have changed. It would be cool to have a diff mode that shows the cell-wise diff of two csv files.

BurntSushi commented 4 years ago

Could you please provide more details? An example with inputs and outputs would help.

turion commented 4 years ago

An example with inputs and outputs would help.

Good idea.

Here is a rough sketch:

$ cat a.csv
foo,bar
a,23
b,42
[...many lines]

$ cat b.csv
foo,bar
a,100
b,42
[...many lines]
c,0

$ xsv --diff a.csv b.csv
@@ -1,bar +1,bar @@ foo,bar
  a,-100+23
@@ -1234 +1234 @@ foo,bar
- c,0

$ xsv table --diff a.csv b.csv
  foo bar
  a   -100+23
+ c   0
turion commented 4 years ago

I guess there are other interesting interactions. E.g. xsv stats --diff could show the number of changed rows and cells. xsv select --diff could limit the diff on certain columns.

Yomguithereal commented 4 years ago

Hello @turion, do you know daff? Reading your initial question I remembered about this tool that seem to do the job you need. It can also be easily integrated with git if I remember correctly.

turion commented 4 years ago

@Yomguithereal that sounds cool! Yes, that's sort of the feature set I'd like to see.

nicoburns commented 4 years ago

I think a daff-style diff is the way to go for this feature. Daff actually has a spec: http://paulfitz.github.io/daff-doc/spec.html, and the codebase (written in Haxe) is MIT licensed.

kevinji commented 3 years ago

The simplest version of this that would be useful for me would contain:

Something like this is useful if you have a job that snapshots state periodically and you need to figure out what changed. Here, the format rarely changes but the contents often do.