Closed tkuntz-hsph closed 3 months ago
Thank you @tkuntz-hsph ! Looks great.
Would we ever have a case where there are less than 4000 items to chunk? If so, would this be okay with a single, smaller than 4000 item chunk? I am just wondering if we need to add some code to account for that case.
This will result in one chunk if there are less than 4K representative sequences, so it'll run correctly. It also accounts for if the chunk size is only slightly larger than the number of sequences and will split the sequences into two roughly equivalently sized chunks.
Fantastic! Thank you!
DADA2 has a known issue for some versions of R where garbage collection isn't correctly run while assigning species leading to large memory requirements. This is avoided by splitting the table into chunks and running them in sequence.