ImagingDataCommons / idc-index-data

Python package providing the index to query and download data hosted by the NCI Imaging Data Commons
MIT License
1 stars 4 forks source link

Switch index from CSV format to parquet #17

Closed vkt1414 closed 4 months ago

vkt1414 commented 4 months ago

parquet offers several advantages over csv particularly when used in combination with duckdb.

duckdb can query directly against parquet instead of having to load the entire file into a dataframe. This is useful as only the required columns are loaded into memory instead of all, as done in csv/pandas combo.