cytomining / CytoTable

Transform CellProfiler and DeepProfiler data for processing image-based profiling readouts with Pycytominer and other Cytomining tools.
https://cytomining.github.io/CytoTable/
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Add large test data cases to help detect non-deterministic challenges #193

Closed d33bs closed 2 weeks ago

d33bs commented 3 months ago

Probably best served by a different PR, but I'm wondering if it is worth adding a test that ensures, in a large enough dataset, that there are no duplicate rows and all expected rows are present.

Originally posted by @gwaybio in https://github.com/cytomining/CytoTable/pull/182#pullrequestreview-2007415041

This could take place with a synthetic dataset which is created through duplication of source data rows with minor variations. It could also reference a remote dataset.

Likely useful to add a Pytest marker indicating the nature of the test as "large" to allow development to skip the test by default.