aswinkarthik / csvdiff

A fast diff tool for comparing csv files
https://aswinkarthik.github.io/csvdiff/
MIT License
532 stars 57 forks source link
csvdiff fastest table-diff

csvdiff

Build Status Go Doc Go Report Card codecov Downloads Latest release

A fast diff tool for comparing csv files.

What is csvdiff?

Csvdiff is a difftool to compute changes between two csv files.

Why?

I wanted to compare if the rows of a table before and after a given time and see what is the new changes that came in. Also, I wanted to selectively compare columns ignoring columns like created_at and updated_at. All I had was just the dumped csv files.

Demo

asciicast

Usage

$ csvdiff base.csv delta.csv
# Additions (1)
+ 24564,907,completely-newsite.com,com,19827,32902,completely-newsite.com,com,1621,909,19787,32822
# Modifications (1)
- 69,48,aol.com,com,97543,225532,aol.com,com,70,49,97328,224491
+ 69,1048,aol.com,com,97543,225532,aol.com,com,70,49,97328,224491
# Deletions (1)
- 1618,907,deleted-website.com,com,19827,32902,deleted-website.com,com,1621,909,19787,32822
Differentiates two csv files and finds out the additions and modifications.
Most suitable for csv files created from database tables

Usage:
  csvdiff <base-csv> <delta-csv> [flags]

Flags:
      --columns ints          Selectively compare positions in CSV Eg: 1,2. Default is entire row
  -o, --format string         Available (rowmark|json|legacy-json|diff|word-diff|color-words) (default "diff")
  -h, --help                  help for csvdiff
      --ignore-columns ints   Inverse of --columns flag. This cannot be used if --columns are specified
      --include ints          Include positions in CSV to display Eg: 1,2. Default is entire row
  -p, --primary-key ints      Primary key positions of the Input CSV as comma separated values Eg: 1,2 (default [0])
  -s, --separator string      use specific separator (\t, or any one character string) (default ",")
      --time                  Measure time
  -t, --toggle                Help message for toggle
      --version               version for csvdiff

Installation

Homebrew

brew tap thecasualcoder/stable
brew install csvdiff

Using binaries

# binary will be $GOPATH/bin/csvdiff
curl -sfL https://raw.githubusercontent.com/aswinkarthik/csvdiff/master/install.sh | sh -s -- -b $GOPATH/bin

# or install it into ./bin/
curl -sfL https://raw.githubusercontent.com/aswinkarthik/csvdiff/master/install.sh | sh -s

# In alpine linux (as it does not come with curl by default)
wget -O - -q https://raw.githubusercontent.com/aswinkarthik/csvdiff/master/install.sh | sh -s

Using source code

go get -u github.com/aswinkarthik/csvdiff

Use case

Supported

Not Supported

Formats

There are a number of formats supported

Miscellaneous features

% csvdiff base.csv delta.csv --primary-key 0,1
% csvdiff base.csv delta.csv --primary-key 0,1 --columns 2
% csvdiff examples/base-small.csv examples/delta-small.csv --format json | jq '.'
{
  "Additions": [
    "24564,907,completely-newsite.com,com,19827,32902,completely-newsite.com,com,1621,909,19787,32822"
  ],
  "Modifications": [{
    "Original": "69,1048,aol.com,com,97543,225532,aol.com,com,70,49,97328,224491",
    "Current":  "69,1049,aol.com,com,97543,225532,aol.com,com,70,49,97328,224491"
  }],
  "Deletions": [
    "1615,905,deleted-website.com,com,19833,33110,deleted-website.com,com,1613,902,19835,33135"
  ]
}

Build locally

$ git clone https://github.com/aswinkarthik/csvdiff
$ go get ./...
$ go build

# To run tests
$ go get github.com/stretchr/testify/assert
$ go test -v ./...

Algorithm

Credits