The files at _output are re-generated at every run. While there are checks to see if the Git repository of the data package has changed, currently we have no way to know if the files at _output are stale or not.
This is an issue in the case of big datasets, taking some time to copy the CSV files to the download dir.
The solution would be a cache file that registers the last commit from which a data package was generated the checksum of each CSV data file to determine if the files are identical (and if not, they should be overwritten).
The files at
_output
are re-generated at every run. While there are checks to see if the Git repository of the data package has changed, currently we have no way to know if the files at_output
are stale or not.This is an issue in the case of big datasets, taking some time to copy the CSV files to the
download
dir.The solution would be a cache file that registers the
last commit from which a data package was generatedthe checksum of each CSV data file to determine if the files are identical (and if not, they should be overwritten).