Open lisagrigoreva opened 1 year ago
Unique reads in this case represents the number of reads with UMIs that are not exactly identical to any other read's UMI. This does not account for errors in the UMIs, which is why the count is greater than the number of reads after deduplicating. The deduplication process allows similar (but not exactly identical) UMIs to be grouped together.
Thank you! Is it possible somehow to get reads with identical UMIs ? I suppose putting p=1?
If you want to only deduplicate reads if they have the exact same UMI, you should pass in -k 0
to indicate that zero errors are tolerated.
Hi, I was worried about what exactly represents the output of 'Number of unique reads' in collapsing reads from fastq? Number of input read 6609696 Number of unique reads 3885326 Number of reads after deduplicating 3028828
Because it seems like the number of unique reads should be similar with number of reads after deduplicating