chris-mcginnis-ucsf / MULTI-seq

R implementation of MULTI-seq sample classification workflow
59 stars 10 forks source link

removal of duplicated umis #12

Closed MichaelPeibo closed 4 years ago

MichaelPeibo commented 4 years ago

Hi @chris-mcginnis-ucsf ,

Thanks for developing this great package.

I am trying to understand the preprocessing step. In MULTIseq.Align.Suite.R code, there is step called Remove reads representing duplicated UMIs on a cell-by-cell basis https://github.com/chris-mcginnis-ucsf/MULTI-seq/blob/73411337bc470b846c12f201f96699ae5e1188be/R/MULTIseq.Align.Suite.R#L50-L52

I did not get through with it. 1.This duplicated UMIs is detected in same Cell?

  1. Are Multiple UMIs(duplicated) used for calculating barcode counts matrix?

Or maybe I did not understand the code.

Thanks!

MichaelPeibo commented 4 years ago

I think I figure this out. Length of different and unique UMIs(no matter how many reads of one kind of UMI) are counts for certain barcode.