ImagingDataCommons / TCIA-IDC-Coordination

1 stars 1 forks source link

[TH-46248] API endpoint to get hash at the series/instance level #15

Closed fedorov closed 3 years ago

fedorov commented 4 years ago

TCIA issue: https://help.cancerimagingarchive.net/servicedesk/customer/portal/1/TH-46248

Such API endpoint will allow us to confirm consistency of the downloaded item.

IDC needs to replicate TCIA content, and currently we do not have any means to ensure consistency of the downloaded content. To work around the lack of this functionality we need to download the content twice and confirm consistency of the download by comparing the hash between the two attempts.

fedorov commented 4 years ago

Comment from @bcli4d:

An md5 hash at instance level would be best because we could then verify it against the md5 hash that is available on GCS blobs.

Related tickets in the NBIA Jira from @ulrikew:

bcli4d commented 3 years ago

An instance level hash is optimal, but a series level hash is sufficient in that we down at series granularity, can validate such downloads if we have a series level hash. If there were an API that returned a list of instance level hashes for a series, then we could download just those instance that have changed, which would be a bit more efficient in those cases where an instance is revised.