ImagingDataCommons / SlicerIDCBrowser

A 3D Slicer extension to support access to the content of NCI Imaging Data Commons
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Do not download what is already downloaded #10

Open fedorov opened 1 year ago

fedorov commented 1 year ago

Content already downloaded will be downloaded again if requested by the user.

This is a bit tricky, since we cannot rely on UIDs alone to confirm whether the binary content is the same in the DICOM DB as in IDC. Maybe should keep track of what was downloaded already in a separate cache, and keep the hash? Also need to check if the hash is returned by the API.

There is also a situation where user might have downloaded via the extension, but then deleted from the DICOM DB, or (highly unlikely, but not impossible) deleted from DICOM DB, but then imported into the DB instance that has the same UIDs, but is of a different version and has a different hash...

vkt1414 commented 7 months ago

As I was working on reporting download progress in idc-index, I remembered this. An elegant way to handle not downloading what is already downloaded may be to use sync feature in s5cmd. What do you think of this? We could implement right in idc-index, so downloading (or I say syncing in the future) might be faster https://github.com/peak/s5cmd?tab=readme-ov-file#sync