Add data sanity check between ETL and aggregation

Create an independent script which checks:

Are all expected data sources present on S3?
Are the number of files available on S3 correct / the right order of magnitude?
Are the individual file sizes sane?
Check datapackage.json for inclusion of all sources + files.

This would ideally be forced to run before / integrated with the aggregation script. Any warnings and errors would need to be acknowledged or resolved before kicking off the aggregation process.

cybergreen-net / pm

Add data sanity check between ETL and aggregation #88