10XGenomics / cellranger

10x Genomics Single Cell Analysis
https://www.10xgenomics.com/support/software/cell-ranger
Other
340 stars 91 forks source link

Error in cellranger-arc online algorithms overview regarding duplicate removal #164

Closed kwcurrin closed 8 months ago

kwcurrin commented 2 years ago

Hello,

Thank you for developing the cellranger tools, they are very useful.

It looks like there is an error in the online algorithms overview for cellranger-arc regarding duplicate removal for ATAC-seq data: https://support.10xgenomics.com/single-cell-multiome-atac-gex/software/pipelines/latest/algorithms/overview

The duplicate removal method is initially correctly described here: "When a group of read pairs share the same (start, end, hashed barcode), one of them is labeled as unique and the rest are labeled duplicates. If the unique read passes the filters described in the next paragraph, this is the only read pair that is reported as a fragment in the fragment file."

But in the next paragraph referencing the read filters, there is a sentence that matches the old duplicate removal method that did not include cell barcodes: "Note that as a consequence of this approach, each unique interval on the genome can be associated with only one barcode."

The above sentence is not present in the "Algorithms Overview" for cellranger-atac, so maybe it was just left in the cellranger-arc description by accident.

I just wanted to point this out to avoid confusion. When I initially read it, I thought that the duplicate removal step didn't consider barcodes afterall. However, I checked the atac_fragments.tsv.gz file and it contains multiple fragments at the same genomic position linked to different barcodes, as expected.

Thanks,

Kevin