UcarLab / AMULET

A count based method for detecting doublets from single nucleus ATAC-seq (snATAC-seq) data.
https://ucarlab.github.io/AMULET/
GNU General Public License v3.0
29 stars 5 forks source link

How to individually evaluate simulated artificial doublets #16

Closed Yydtqqqg closed 2 years ago

Yydtqqqg commented 2 years ago

I read the original thesis PDF of AMULET. When you simulate artificial doublets to measure recall for detecting multiplets, I wonder how "each one of them is separately added to the existing set of single cells (them = artificial doublets)" and how you "tested the performance of DoubletDetector by evaluating each artificial cell individually". Did you generate a bam file for each simulated artificial doublet, run overlap counter (the first step of AMULET) on the bam file to obtain the overlaps files, and add the overlaps to the overlaps files of the bam file of the set of real cells?

alperoglu commented 2 years ago

Hi,

Thank you for your interest in AMULET. The thesis follows an older framework we used for assessing the artificial/simulated doublets. The updated version is here at our published work's methods section. We don't create a separate bam file for this. We map the barcodes for the two paired nuclei chosen to make up the artificial doublet to the same cell identifier in the singlecell.csv file coming from cellranger-atac. The first part of AMULET is then run with this updated csv file and the bam file is not changed. Hence the artificial doublets that constitute to 2.5% of the total number of nuclei are treated the same as the other nuclei. We then check whether these artificial nuclei are correctly labeled by the second part as doublets for the reported performance.

Best, Alper

Yydtqqqg commented 2 years ago

Thank you for the response! When I used AMULET to test datasets with simulated doublets, I did not know that I can do it such easily! I used a two-column singlecell.csv file where the first column is barcode and the second column is 0/1. I regenerated a bam file in which barcodes for two paired nuclei chosen to make up the artificial doublet are replaced by an artificial barcode, i.e. if cell ATCG and cell GCTA is to be replaced by cell AAAA, the CB attribute of every record of ATCG and GCTA is replaced with AAAA. And in singlecell.csv ATCG and GCTA would also be replaced by AAAA. Is this method also valid? If I want to switch to your method, all I have to do is to come up with a unique cell identifier for each cell in the singlecell.csv file and map barcodes of two singlets to the same cell ID?

alperoglu commented 2 years ago

I think your way of generating a new bam file would work the same way but it would be a little time consuming to go through every instance in the aligned reads and change them. For your second question, yes, in our method you only have to map the chosen pair of nuclei to map to a new artificial cell identifier. You can keep the existing cell barcodes from cellranger-atac as identifiers for the unselected nuclei.