18F / data-inventory

18F's contributions to the GSA enterprise data inventory and public data listing
4 stars 3 forks source link

CSV validator and continuous integration? #6

Open harrisj opened 9 years ago

harrisj commented 9 years ago

If this becomes bigger (or the data.csv is read on a more automatic basis by the GSA), we should consider adding an automated script for validating the CSV (could look at csvkit) and possibly a CI step that would validate any CSV changes in pull requests

harrisj commented 9 years ago

This turns out to not be how the GSA workflow works. Instead, they manually edit the JSON using a tool based on the information we give them in the CSV. This raises the possibility of data entry errors, so there are really two types of scripts we might want to create for the future:

  1. A script that validates entries in our CSV based upon some rules and issues we outline. We could use csv-test as a basis, although it would be nice to be able to snooze alerts about some rows.
  2. A script that compares data in the GSA's data.json to information in our CSV. This would be useful for catching data entry issues.

Neither of these would need to be run in a Continuous Integration loop, but they would be good for regular spot checks of the data.json file.