PascalLesage / presamples

Package to write, load, manage and verify numerical arrays, called presamples.
BSD 3-Clause "New" or "Revised" License
14 stars 11 forks source link

hash should be for data, not data location #9

Closed PascalLesage closed 7 years ago

PascalLesage commented 7 years ago

The hash included in the presamples package is for the location of the data on disk. We eventually want to have data stored off-disk (see https://github.com/PascalLesage/brightway2-presamples/issues/7).
Since what we want to ensure is that the actual data passed is not corrupt, it would be preferable to hash the data (samples, parameter names, etc).

cmutel commented 7 years ago

Not sure why youthink we hash the location and not the actual contents - we import bw2data.filesystem.md5, which is definitely hashing the file content.

I agree that this whole system should be redesigned to account for remote resources where we don't have access to the filesystem, and can't check the contents. But I think this discussion should be postponed for now.

PascalLesage commented 7 years ago

My bad. I assumed the datapackage hash fontains the hash of the filepaths, see e.g. here is what is stored as a hash: "hash": md5(cfs_samples_fp) But I reread the bw2data.filesystem.md5 function, all is good.