Very slow paired reads mode for transcriptome

Hi!

I am trying to make UMICollapse the default tool in one of the popular RNAseq analysis pipelines -- https://github.com/nf-core/rnaseq/issues/1087.

Not sure if this is covered by #5 already, but when using paired reads aligned to the human transcriptome, it seems like UMICollapse is 20x slower when compared to umi-tools. UMICollapse takes between 9-10 hours for the BAM files we are considering, whereas umi-tools takes ~30 minutes. The slowness is present in both two-pass and single pass modes.

I have not gone through how UMICollapse works, so I do not have an opinion on whether this is expected or not. If it is expected, some commentary on this in the README would be appreciated.

I have made some test data available in Google Drive. You will notice that the BAM file has 44319354 read pairs with 8 bp UMIs.

Thank you for continuing to follow up on your work from a long time ago.

Daniel-Liu-c0deb0t / UMICollapse

Very slow paired reads mode for transcriptome #31