ImagingDataCommons / idc-index-data

Python package providing the index to query and download data hosted by the NCI Imaging Data Commons
MIT License
1 stars 4 forks source link

enh: simplify the query by using aws_bucket #20

Closed fedorov closed 4 months ago

fedorov commented 4 months ago

This way we avoid parsing bucket name from aws_url

vkt1414 commented 4 months ago

There's an even more precise way! We do have series_aws_url directly in dicom_all. We still need to concat * at the end.

image

fedorov commented 4 months ago

Ah, I missed it!

But I think we should revisit this to be more precise. I don't think it is good to change the value of aws_series_url by adding * - instead I think we should add it on the fly while building the manifest or s5cmd download operation. Otherwise it is very confusing that the value is different between idc-index and BQ index.

vkt1414 commented 4 months ago

Agree! Should we create an issue to track or just go ahead and create another PR? as we are yet to move to parquet in idc-index in https://github.com/ImagingDataCommons/idc-index/pull/57 ?

fedorov commented 4 months ago

I suggest let's move to Parquet first, then update to v18, and then deal with other issues.