NSAPH-Projects / space

SpaCE, the Spatial Confounding Environment, loads benchmark datasets for causal inference methods tackling spatial confounding
https://nsaph-projects.github.io/space/
MIT License
14 stars 5 forks source link

Add checksum value to data sets #95

Open Naeemkh opened 1 year ago

Naeemkh commented 1 year ago

In the SpaceEnv.download_data function, we only check for the existence of the directory, and if it's present, we don't proceed to download the data. If the file doesn't exist, eventually a 'file not found' error will be triggered. However, this approach won't detect if the files have been modified. It would be beneficial to incorporate checksum values for the data in the master file. When we need to validate the data, we can then compare these stored checksum values with the checksum of the current data.

mauriciogtec commented 11 months ago

@atrisovic Does dataverse has some built-in functionality for checksums?

atrisovic commented 10 months ago

Yes it does. It's worth double checking if it's incorporated within the pyDV, if not it's not hard to implement it