Open turion opened 4 years ago
Could you please provide more details? An example with inputs and outputs would help.
An example with inputs and outputs would help.
Good idea.
Here is a rough sketch:
$ cat a.csv
foo,bar
a,23
b,42
[...many lines]
$ cat b.csv
foo,bar
a,100
b,42
[...many lines]
c,0
$ xsv --diff a.csv b.csv
@@ -1,bar +1,bar @@ foo,bar
a,-100+23
@@ -1234 +1234 @@ foo,bar
- c,0
$ xsv table --diff a.csv b.csv
foo bar
a -100+23
+ c 0
I guess there are other interesting interactions. E.g. xsv stats --diff
could show the number of changed rows and cells. xsv select --diff
could limit the diff on certain columns.
Hello @turion, do you know daff? Reading your initial question I remembered about this tool that seem to do the job you need. It can also be easily integrated with git if I remember correctly.
@Yomguithereal that sounds cool! Yes, that's sort of the feature set I'd like to see.
I think a daff
-style diff is the way to go for this feature. Daff actually has a spec: http://paulfitz.github.io/daff-doc/spec.html, and the codebase (written in Haxe) is MIT licensed.
The simplest version of this that would be useful for me would contain:
Something like this is useful if you have a job that snapshots state periodically and you need to figure out what changed. Here, the format rarely changes but the contents often do.
I frequently have to compare large csv files where only a few fields in moderately many rows have changed. It would be cool to have a diff mode that shows the cell-wise diff of two csv files.