The previous implementation used iglob to scan a BIDS root directory recursively. Profiling on the large ABCD dataset showed that up to 60% of time was spent just doing this.
Here we use a different source that yields subject directories rather than individual files. This will save individual workers work since hash partitioning now happens at the higher level of subject directories.
Also, it means data will be grouped better in the table since files will be partitioned on subject dir rather than the full path.
The previous implementation used
iglob
to scan a BIDS root directory recursively. Profiling on the large ABCD dataset showed that up to 60% of time was spent just doing this.Here we use a different
source
that yields subject directories rather than individual files. This will save individual workers work since hash partitioning now happens at the higher level of subject directories.Also, it means data will be grouped better in the table since files will be partitioned on subject dir rather than the full path.