Add large test data cases to help detect non-deterministic challenges

Probably best served by a different PR, but I'm wondering if it is worth adding a test that ensures, in a large enough dataset, that there are no duplicate rows and all expected rows are present.

Originally posted by @gwaybio in https://github.com/cytomining/CytoTable/pull/182#pullrequestreview-2007415041

This could take place with a synthetic dataset which is created through duplication of source data rows with minor variations. It could also reference a remote dataset.

Likely useful to add a Pytest marker indicating the nature of the test as "large" to allow development to skip the test by default.

cytomining / CytoTable

Add large test data cases to help detect non-deterministic challenges #193