everypolitician / compare_with_wikidata

Library for diffing Wikidata and CSVs
MIT License
2 stars 0 forks source link

Only include/compare columns that are common to both sources #69

Closed tmtmtmtm closed 7 years ago

tmtmtmtm commented 7 years ago

Currently, if the two CSVs don't exactly match, we get the error in #59.

The morph proxy lets us vary the output of the CSV generated from there, but in many other cases, the prompt-creator won't be able to control the format — e.g. if they point at an EveryPolitician term CSV. So currently if someone wanted a prompt to compare the twitter handles in an EP term CSV, they would need to also generate every other column in it too!

A potential workaround to both these problems is for the tool to pre-process both CSVs to only include columns that are common to both before diffing them.

This would then guarantee the schema is always the same (Or provide a sensible error message if there are no overlapping columns), and also make it much much easier to consume an external CSV file, simply by only including the required columns in the SPARQL query.