The duplicate removal method is initially correctly described here:
"When a group of read pairs share the same (start, end, hashed barcode), one of them is labeled as unique and the rest are labeled duplicates. If the unique read passes the filters described in the next paragraph, this is the only read pair that is reported as a fragment in the fragment file."
But in the next paragraph referencing the read filters, there is a sentence that matches the old duplicate removal method that did not include cell barcodes:
"Note that as a consequence of this approach, each unique interval on the genome can be associated with only one barcode."
The above sentence is not present in the "Algorithms Overview" for cellranger-atac, so maybe it was just left in the cellranger-arc description by accident.
I just wanted to point this out to avoid confusion. When I initially read it, I thought that the duplicate removal step didn't consider barcodes afterall. However, I checked the atac_fragments.tsv.gz file and it contains multiple fragments at the same genomic position linked to different barcodes, as expected.
Hello,
Thank you for developing the cellranger tools, they are very useful.
It looks like there is an error in the online algorithms overview for cellranger-arc regarding duplicate removal for ATAC-seq data: https://support.10xgenomics.com/single-cell-multiome-atac-gex/software/pipelines/latest/algorithms/overview
The duplicate removal method is initially correctly described here: "When a group of read pairs share the same (start, end, hashed barcode), one of them is labeled as unique and the rest are labeled duplicates. If the unique read passes the filters described in the next paragraph, this is the only read pair that is reported as a fragment in the fragment file."
But in the next paragraph referencing the read filters, there is a sentence that matches the old duplicate removal method that did not include cell barcodes: "Note that as a consequence of this approach, each unique interval on the genome can be associated with only one barcode."
The above sentence is not present in the "Algorithms Overview" for cellranger-atac, so maybe it was just left in the cellranger-arc description by accident.
I just wanted to point this out to avoid confusion. When I initially read it, I thought that the duplicate removal step didn't consider barcodes afterall. However, I checked the atac_fragments.tsv.gz file and it contains multiple fragments at the same genomic position linked to different barcodes, as expected.
Thanks,
Kevin