Genometric / MSPC

Using combined evidence from replicates to evaluate ChIP-seq peaks
https://genometric.github.io/MSPC/
GNU General Public License v3.0
19 stars 10 forks source link

What value to set for -c? #150

Closed danielcgingerich closed 3 years ago

danielcgingerich commented 3 years ago

What would you recommend setting for -cargument? I have 24 scATAC datasets. Each dataset has anywhere from 1000-10000 cells. Is 50% a good choice? The cell types are neurons (excitatory/inhibitory), astrocytes, microglia, oligodendrocytes, and OPC's. I am doing mspc using peaks called separately for each cell type (macs2)

VJalili commented 3 years ago

The value of -c should be set according to your study setup. In the following, I'll briefly explain -c and how I used it in my scATAC-seq experiments; hopefully, that gives some insights to decide on a proper value for your setup.

The argument -c sets the degree of overlap you expect from the called peaks. The peaks that do not satisfy this condition, are discarded, and those that satisfy this condition will be further processed (e.g., combined stringency test and multiple testing correction). For instance, suppose you're studying 3 technical replicates of ChIP-seq (rep1, rep2, and rep3) and you expect every peak from any of the replicates to overlap with at least one peak from each of the other replicates. In this case, you set -c = 3 (or -c = 100%). Accordingly, a peak from rep1 is discarded if it overlaps with a peak from rep2 but does not overlap with a peak from rep3.

In scATAC-seq studies, peaks in a dataset are called based on the abundance of all the cells' reads in the dataset, and the called peaks are associated with individual cells using the barcodes. In other words, MACS2 calls peaks if their p-value satisfies a given threshold, where the p-value is computed based on the abundance of peaks (and how it differs from the background signal). Therefore, the called peaks already pass an "abundance" check, hence -c = 1 is not a bad option (-c = 1 basically disables the abundance check by MSPC). Additionally, you may think about the degree of similarity/discrepancy you expect between the cells in your experiment: for a higher discrepancy, you may use smaller -c (close to 1), or, for a higher similarity, you may use larger -c (close to the number of cells in the dataset or 100%).

VJalili commented 3 years ago

@danielcgingerich Thank you for using MSPC. I assume the above explanation answered your question, hence I am closing this issue. Please feel free to reopen the issue if otherwise, or create different issues if you have other questions.