ImagingDataCommons / idc-index-data

Python package providing the index to query and download data hosted by the NCI Imaging Data Commons
MIT License
1 stars 4 forks source link

ENH: add metadata necessary to download from previous idc-versions #32

Closed vkt1414 closed 4 months ago

vkt1414 commented 4 months ago

Pulls the strictly necessary metadata to allow downloading data from previous idc versions

solves https://github.com/ImagingDataCommons/idc-index/issues/100

fedorov commented 4 months ago

Wow, this looks really great from the description - I will review tomorrow!

How about renaming the query and the result int prior_versions_index? Or some other ..._index.

It is a key step towards solving #100, but to actually solve it, we would need to integrate handling of this additional table in idc-index.

vkt1414 commented 4 months ago

@fedorov I just want to make sure you want to update the latest_idc_version variable yourself.

https://github.com/ImagingDataCommons/idc-index-data/pull/32/files#diff-1b8acf7ccf6a7bc784d04a22bb82263a847dacb31532f44d6c046ada6540e24eR3

fedorov commented 4 months ago

@fedorov I just want to make sure you want to update the latest_idc_version variable yourself.

As opposed to running daily github action and doing this automatically? If that's the question, then yes - I don't see a need in automating that.

vkt1414 commented 4 months ago

@fedorov I just want to make sure you want to update the latest_idc_version variable yourself.

As opposed to running daily github action and doing this automatically? If that's the question, then yes - I don't see a need in automating that.

Not a github action. Initially I had the following code to get the latest idc version dynamically. I hardcoded to v18 for now as Bill needs to update the table -bigquery-public-data.idc_current.version_metadata. But you removed it in your suggestion and I wanted to know if you'll change when switching idc version. And it looks you are. So I'm ok with that.

SET latest_idc_version = ( SELECT 18 --SELECT max(idc_version) --FROM --bigquery-public-data.idc_current.version_metadata );

fedorov commented 4 months ago

I am pretty sure you had it hardcoded as well - if I remember correctly, I just replaced "SELECT 18" with a variable initialization. The other lines were commented out. I don't have any problem keeping the commented out lines in the query, if you prefer. We can uncomment whenever that table is fixed. I don't recall Bill mentioning he is working on that, so I don't know about the timeline.

vkt1414 commented 4 months ago

I am pretty sure you had it hardcoded as well - if I remember correctly, I just replaced "SELECT 18" with a variable initialization. The other lines were commented out. I don't have any problem keeping the commented out lines in the query, if you prefer. We can uncomment whenever that table is fixed. I don't recall Bill mentioning he is working on that, so I don't know about the timeline.

I added the commented out code back. Once Bill updates the table, we can revisit the query