Probably best served by a different PR, but I'm wondering if it is worth adding a test that ensures, in a large enough dataset, that there are no duplicate rows and all expected rows are present.
This could take place with a synthetic dataset which is created through duplication of source data rows with minor variations. It could also reference a remote dataset.
Likely useful to add a Pytest marker indicating the nature of the test as "large" to allow development to skip the test by default.
Originally posted by @gwaybio in https://github.com/cytomining/CytoTable/pull/182#pullrequestreview-2007415041
This could take place with a synthetic dataset which is created through duplication of source data rows with minor variations. It could also reference a remote dataset.
Likely useful to add a Pytest marker indicating the nature of the test as "large" to allow development to skip the test by default.