gdcc / pyDataverse

Python module for Dataverse Software (dataverse.org).
http://pydataverse.readthedocs.io/
MIT License
63 stars 41 forks source link

Verify file integrity of downloaded files by hash sum #115

Open skasberger opened 3 years ago

skasberger commented 3 years ago

Verify the file integrity of files downloaded with their hash values. Mentioned in a call by @atrisovic.

Prepare

Implementation

import hashlib
from pyDataverse.api import NativeApi
api = NativeApi("https://data.aussda.at)
resp = api.get_datafile(3702)
m = hashlib.md5()
# m = hashlib.sha1()
# m = hashlib.sha256()
# m = hashlib.sha512()
m.update(resp.content)
m.hexdigest()

Review

Follow-Ups

atrisovic commented 3 years ago

Hey @skasberger!

This is how I solved the problem for checking the checksum error in my previous project: https://github.com/atrisovic/dataverse-r-study/blob/0fc1c223ed0a0777633f94f9b7cad699003aaf7a/docker/download_dataset.py#L32-L39

I tried playing with the client to incorporate the code, but I think it's quite awkward to do it the same way. I can still share the code if you think it would be any helpful, but I think there needs to be another approach x)

pdurbin commented 4 months ago

As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python