ImagingDataCommons / idc-index-data

Python package providing the index to query and download data hosted by the NCI Imaging Data Commons
MIT License
1 stars 4 forks source link

Change index to parquet format #16

Closed vkt1414 closed 7 months ago

vkt1414 commented 7 months ago

parquet offers several advantages over csv particularly when used in combination with duckdb.

duckdb can query directly against parquet instead of having to load the entire file into a dataframe. This is useful as only the required columns are loaded into memory instead of all, as done in csv/pandas combo.