Open PeteHaitch opened 5 years ago
for UMI_cor = 1
, all UMIs that mapped to the same genes are grouped together and duplicated UMIs are removed.
for UMI_cor = 2
, all UMIs that mapped to the same genes and in the same positions are grouped together and duplicated UMIs are removed. so UMI_cor assume that one molecule, after amplification, would only generate the same fragment.
Later on, I realized it is rarely the case and most protocols, including 10X and CEL-seq2, involves pre-amplification before the full-length cDNAs are cut down to fragments. So there could be more than one fragment for the one mRNA molecule. I did'not delete it in case it is useful in some special situation. But for the most time, it should not be used.
Thanks, Luyi. So UMI_cor = 1
is the recommended value?
I think it would be useful to update the documentation with those extra details and ensure the default value matches the most common protocol(s).
Aside: a description of UMI_cor = 2
is missing from create_report()
:
https://github.com/LuyiTian/scPipe/blob/02e97841332bb616bab76e9d3ff34d0000e6bb21/R/sc_workflow.R#L203-L211
I'm having a hard time understanding the documentation of
UMI_cor = 2
insc_gene_counting()
Could you please clarify?