Accelerating the deduplication and collapsing process for reads with Unique Molecular Identifiers (UMI). Heavily optimized for scalability and orders of magnitude faster than a previous tool.
The handling of undetermined (N) bp is not the best right now, which means that using umiDist and charSet together will produce bugs when there are Ns in the UMI. This is because charSet does not update the separate N bit set. Similar issues appear with cloning UMIs with Ns.
This isn't an issue in most of the code right now, but for any future changes, this needs to be looked into.
The handling of undetermined (
N
) bp is not the best right now, which means that usingumiDist
andcharSet
together will produce bugs when there areN
s in the UMI. This is becausecharSet
does not update the separateN
bit set. Similar issues appear with cloning UMIs withN
s.This isn't an issue in most of the code right now, but for any future changes, this needs to be looked into.