Closed atolopko-czi closed 1 month ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 91.15%. Comparing base (
c18c1a9
) to head (e3dd3b1
). Report is 8 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
For informational purposes, a description of a general algorithm is captured in this github issue:
https://github.com/chanzuckerberg/cellxgene-census/issues/1146
Per a synchronous conversation with @ebezzi we decided to scrap the test as it needs more thought. We will make a ticket to write a better test for it but for the sake of expediency, we want to get this merged and get some of our first users to use it. Anecdotal evidence suggests that the scatter-gather-shuffle
algorithm is performant and gives good randomness.
Adds a
shuffle_chunk_count
parameter.Improves randomness of shuffling, while allowing for explicit tuning of memory usage vs I/O performance.