Accenture / AmpliGraph

Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org
Apache License 2.0
2.14k stars 251 forks source link

Add MD5 checksum for datasets #47

Closed lukostaz closed 5 years ago

lukostaz commented 5 years ago

Description Each dataset loader should have an argument check_MD5 (set to False by default) that performs MD5 checksum of the downloaded dataset.

rorymcgrath commented 5 years ago

Do the md5 checksums already exist for these database provided by the original papers or will we be generating them ourselves?

Currently we don't seem to have them.

If we are using them to validate that the datasets are the same as used in the referenced papers we need to get their original md5 checksums.

If we are using them to validate that the download was not corrupted, generating them ourselves will be fine.

lukostaz commented 5 years ago

Original datasets do not come with a checksum, unfortunately. We will have to generate checksums from a version we download from their original URLs:

lukostaz commented 5 years ago

Just noticed the check_md5hash argument is not documented in docstrings. Can you please amend?