childmindresearch / bids2table

Efficiently index large-scale BIDS neuroimaging datasets and derivatives
https://childmindresearch.github.io/bids2table/
MIT License
13 stars 5 forks source link

Crawl BIDS dirs more efficiently #3

Closed clane9 closed 1 year ago

clane9 commented 1 year ago

The previous implementation used iglob to scan a BIDS root directory recursively. Profiling on the large ABCD dataset showed that up to 60% of time was spent just doing this.

Here we use a different source that yields subject directories rather than individual files. This will save individual workers work since hash partitioning now happens at the higher level of subject directories.

Also, it means data will be grouped better in the table since files will be partitioned on subject dir rather than the full path.