Closed fedorov closed 3 years ago
Comment from @bcli4d:
An md5 hash at instance level would be best because we could then verify it against the md5 hash that is available on GCS blobs.
Related tickets in the NBIA Jira from @ulrikew:
An instance level hash is optimal, but a series level hash is sufficient in that we down at series granularity, can validate such downloads if we have a series level hash. If there were an API that returned a list of instance level hashes for a series, then we could download just those instance that have changed, which would be a bit more efficient in those cases where an instance is revised.
TCIA issue: https://help.cancerimagingarchive.net/servicedesk/customer/portal/1/TH-46248
Such API endpoint will allow us to confirm consistency of the downloaded item.
IDC needs to replicate TCIA content, and currently we do not have any means to ensure consistency of the downloaded content. To work around the lack of this functionality we need to download the content twice and confirm consistency of the download by comparing the hash between the two attempts.