HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
129 stars 53 forks source link

Support Domain Checksums #48

Closed jreadey closed 4 years ago

jreadey commented 4 years ago

Enable domain checksums - an aggregation of all ETag values for all objects within a domain. These will be created asynchronously using the same process as with domain info.

jreadey commented 4 years ago

Changed REST api for requesting a rescan: https://github.com/HDFGroup/hsds/commit/6b6680aafccbb916419de952e09db41e7632a53d

jreadey commented 4 years ago

This should be working. An MD5 checksum is computed over all chunks and metadata objects in the domain (other than the domain json (i.e. changing the ACLs doesn't modify the checksum).

hsinfo can be used to show checksum for given domain. E.g.:

$ hsinfo  /home/test_user1/test/snp500.h5
domain: /home/test_user1/test/snp500.h5
owner:           test_user1
id:              g-bc48beca-7da3c61f-34c0-3642f5-2b4e97
last modified:   2020-06-17 11:02:56
last scan:       2020-06-17 11:03:05
md5 sum:         fd2c04b0e4eadb603e310933104b93ee
total_size:      113402127
allocated_bytes: 113400000
metadata_bytes:  1863
num objects:     2
num chunks:      54

Existing domains won't have a checksum computed until the next time they are updated. You can use hsinfo --rescan <domain> to force checksum to be computed.