ImagingDataCommons / idc-index

Python package to simplify access to the data available from NCI Imaging Data Commons
https://idc-index.readthedocs.io/
MIT License
11 stars 5 forks source link

Update download functions/tools to support selection by IDs/UIDs from prior versions #128

Open fedorov opened 4 weeks ago

fedorov commented 4 weeks ago

Currently, we support download of the data from prior versions when the content is referenced from the manifest.

However, download_from_selection function and idc download-from-selection/download do not consider prior versions index.

In v19, CMB radiology collections were updated to fix the incorrect PatientID, which resulted in updates to the DICOM UIDs. As reported by @LennyN95, the UIDs from the prior version were used in MHub regression testing, and became inaccessible as a result.

fedorov commented 3 weeks ago

Per discussion with @LennyN95, the following plan emerged:

  1. we keep the current behavior, where past versions can only be downloaded using manifest and CRDC UIUDs. SeriesInstanceUIDs that are not found in the latest version will not be downloaded.
  2. we add the option while downloading to generate the manifest corresponding to what is being downloaded, and include version of IDC (and idc-index) that was used to download and create that manifest.

Related to this I guess is the issue #117 to allow referencing individual items by crdc_series_uuid in the command line tools.

Leo, I have to say I am still in doubt re item 1 above. If a user stored SeriesInstanceUID from a version, and then that one was deprecated, wouldn't it be confusing not to be able to access it?

I may also bring it up at the IDC weekly meeting next week. It's an interesting question.

fedorov commented 6 days ago

117 is resolved, and it is now possible to download series by crdc_series_uuid using command line and API.