cytomining / CytoTable

Transform CellProfiler and DeepProfiler data for processing image-based profiling readouts with Pycytominer and other Cytomining tools.
https://cytomining.github.io/CytoTable/
BSD 3-Clause "New" or "Revised" License
7 stars 5 forks source link

Add capability to sample source datasets by percent or rowcount to increase speed of testing #190

Open d33bs opened 5 months ago

d33bs commented 5 months ago

When making adjustments to CytoTable configuration alongside a source dataset it can take a large amount of time to process the data only to find the configuration fails for a reason that might be quickly detected using a smaller portion of the source dataset (for instance, using 1% of the data from the source). This issue describes a feature where one would be able to limit the data used by CytoTable using, for example the SAMPLE feature from DuckDB. This could be used to limit the data extracted to a smaller amount.