SINTEF / pyopia

Python Ocean Particle Image Analysis
https://pyopia.readthedocs.io
BSD 3-Clause "New" or "Revised" License
11 stars 6 forks source link

Add more strategies for chunking file list to speed up processing with imbalanced dataset #322

Closed nepstad closed 2 months ago

nepstad commented 2 months ago

When an image dataset is imbalanced, where a subset of clustered (in time) images takes much longer to process than the rest, the block chunking we are using now may result in very imbalanced chunks, and the processing time dominated by one of the chunks. Adding striping or other strategies for splitting up the file list may yield better results in these cases.