aswinkarthik / csvdiff

A fast diff tool for comparing csv files
https://aswinkarthik.github.io/csvdiff/
MIT License
532 stars 57 forks source link

Compare files with different columns count #39

Open tmtben opened 4 years ago

tmtben commented 4 years ago

Hello,

Thanks for this great tool!

Here is my use case: • my base-csv contains whole rows with many columns. • my delta-csv contains a list of primary keys (one column only). I would like to get diff comparing only primary keys.

I got this error with csvdiff version 1.3.0:

# csvdiff base.csv pk.csv
csvdiff: command failed - base-file and delta-file columns count do not match

Best regards, Ben

aswinkarthik commented 4 years ago

While validating both the configuration files, there is a check for the column cout here

How about

csvdiff base.csv pk.csv --columns 0

This means only one column is compared and it will also check if that index of that column is present in both the CSV files.

tmtben commented 4 years ago

Using version 1.4 with "columns" flag:

# csvdiff base.csv pk.csv --columns 0
csvdiff: command failed - base-file and delta-file columns count do not match
aswinkarthik commented 4 years ago

I was thinking of introducing a feature where if we specify columns flag, we dont need to check if columns count match. All that matters is if the specified column is present on both csvs.

That would satisfy your requirement i believe.

pascalbe-dev commented 4 years ago

Any update on this?

This would be really nice. Furthermore it would be nice, if we could check which columns have been added (with which values).

tmtben commented 3 years ago

Maybe just replace following line https://github.com/aswinkarthik/csvdiff/blob/1007bf3599b3077a22e754281e671e8ac2996d42/cmd/config.go#L55 by this one if len(valueColumnPositions) == 0 && baseRecordCount != deltaRecordCount {

Now, I can use "--columns 0" to compare only the first column of two CSV files having different headers.