A fast diff tool for comparing csv files.
Csvdiff is a difftool to compute changes between two csv files.
created_at
timestamps.I wanted to compare if the rows of a table before and after a given time and see what is the new changes that came in. Also, I wanted to selectively compare columns ignoring columns like created_at
and updated_at
. All I had was just the dumped csv files.
$ csvdiff base.csv delta.csv
# Additions (1)
+ 24564,907,completely-newsite.com,com,19827,32902,completely-newsite.com,com,1621,909,19787,32822
# Modifications (1)
- 69,48,aol.com,com,97543,225532,aol.com,com,70,49,97328,224491
+ 69,1048,aol.com,com,97543,225532,aol.com,com,70,49,97328,224491
# Deletions (1)
- 1618,907,deleted-website.com,com,19827,32902,deleted-website.com,com,1621,909,19787,32822
Differentiates two csv files and finds out the additions and modifications.
Most suitable for csv files created from database tables
Usage:
csvdiff <base-csv> <delta-csv> [flags]
Flags:
--columns ints Selectively compare positions in CSV Eg: 1,2. Default is entire row
-o, --format string Available (rowmark|json|legacy-json|diff|word-diff|color-words) (default "diff")
-h, --help help for csvdiff
--ignore-columns ints Inverse of --columns flag. This cannot be used if --columns are specified
--include ints Include positions in CSV to display Eg: 1,2. Default is entire row
-p, --primary-key ints Primary key positions of the Input CSV as comma separated values Eg: 1,2 (default [0])
-s, --separator string use specific separator (\t, or any one character string) (default ",")
--time Measure time
-t, --toggle Help message for toggle
--version version for csvdiff
brew tap thecasualcoder/stable
brew install csvdiff
# binary will be $GOPATH/bin/csvdiff
curl -sfL https://raw.githubusercontent.com/aswinkarthik/csvdiff/master/install.sh | sh -s -- -b $GOPATH/bin
# or install it into ./bin/
curl -sfL https://raw.githubusercontent.com/aswinkarthik/csvdiff/master/install.sh | sh -s
# In alpine linux (as it does not come with curl by default)
wget -O - -q https://raw.githubusercontent.com/aswinkarthik/csvdiff/master/install.sh | sh -s
go get -u github.com/aswinkarthik/csvdiff
additions.csv
can be used to create an insert.sql
and with the modifications.csv
an update.sql
data migration.There are a number of formats supported
diff
: Git's diff styleword-diff
: Git's --word-diff style color-words
: Git's --color-words stylejson
: JSON serialization of resultlegacy-json
: JSON serialization of result in old formatrowmark
: Marks each row with ADDED or MODIFIED status.--primary-key
in an integer array. Specify comma separated positions if the table has a compound key. Using this primary key, it can figure out modifications. If the primary key changes, it is an addition.% csvdiff base.csv delta.csv --primary-key 0,1
% csvdiff base.csv delta.csv --primary-key 0,1 --columns 2
% csvdiff examples/base-small.csv examples/delta-small.csv --format json | jq '.'
{
"Additions": [
"24564,907,completely-newsite.com,com,19827,32902,completely-newsite.com,com,1621,909,19787,32822"
],
"Modifications": [{
"Original": "69,1048,aol.com,com,97543,225532,aol.com,com,70,49,97328,224491",
"Current": "69,1049,aol.com,com,97543,225532,aol.com,com,70,49,97328,224491"
}],
"Deletions": [
"1615,905,deleted-website.com,com,19833,33110,deleted-website.com,com,1613,902,19835,33135"
]
}
$ git clone https://github.com/aswinkarthik/csvdiff
$ go get ./...
$ go build
# To run tests
$ go get github.com/stretchr/testify/assert
$ go test -v ./...
key
is a hash of the primary key values as csvvalue
is a hash of the entire rowkey
. An entry in delta map is a
value
.value
is different.