RoseYuan / sc_chromatin_benchmark

Benchmarking computational methods for single-cell ATAC-seq and CUT&Tag
1 stars 0 forks source link

Quality control method is not specified #1

Open newtonharry opened 1 month ago

newtonharry commented 1 month ago

Hi,

I just want to say that I've read your paper recently and I'm grateful for the work you've done. One discrepancy I've noticed is that there is no code provided for the QC of the fragment files. I'm aware of the thresholds you've specified in the paper, but I want to make sure the code is reproduced properly. I'm attempting to reproduce specifically the snapATAC2 results for the PBMC10K Multiome data from the cleaned_data.zip data. I'm assuming that the cleaned_data.zip contains the QC'd data. If not please let me know. Here are the QC plots I generated.

image image

According to the snapATAC2 workflow, these plots reveal that the data has not had QC properly applied. The low density regions of TSSE should be filtered out as they are likely to be low quality cells. If I've missed out on something in regard to the QC, please let me know. What is your justification surrounding those QC parameters in the paper?

Here is the link to the QC for snapATAC2: https://kzhang.org/SnapATAC2/version/2.6/tutorials/pbmc.html

RoseYuan commented 1 month ago

Thank you for your interest in our work!

The code for preprocessing and QC are deposited at: https://github.com/RoseYuan/benchmark_paper/tree/master/data. Specifically, for the 10XPBMC multiome dataset, the QC for RNA is in 1_PBMC10X_RNA_preprocessing.Rmd, and the QC for ATAC is in 3_PBMC_multiomics_ATAC.ipynb.

As you can see from the jupyter notebook and SnapATAC2's tutorial, before filtering, there's a cluster of barcodes with low TSSE score. I'm using ArchR for QC and choosing the QC parameters according to the following plot to exclude this cluster, which is similar to what is done in SnapATAC2's tutorial. Notice that the calculation of TSSE score is slightly different between ArchR and SnapATAC2, so the exact values as well as the thresholds applied might differ between different pipelines. However, both should work in terms of excluding the low-quality cluster.

image
newtonharry commented 3 weeks ago

Thank you for the response! That's very helpful.

I would also like to recommend that you start a discussion board on your repository, so ideas and questions can be talked about, without having to use the issues section.