VCCRI / Sierra

Discover differential transcript usage from polyA-captured single cell RNA-seq data
GNU General Public License v3.0
49 stars 17 forks source link

peak discrepancy #47

Open cag1343 opened 2 years ago

cag1343 commented 2 years ago

Hi,

I'm trying to re-analyze some already published data. previously analyzed data with gene counts detects 17473 features and 56902 cells, my dataset is only able to detect 394 peaks and 170706 cells. which is weird this number should be higher than the number of genes feature or at least equal, is there a point where something could have gone wrong I need to check? thanks.

seuratPeaks <- NewPeakSeurat( peak.counts, peak.annotations, min.cells = 0, min.peaks = 0 )

Same data but with function: peaks.seurat <- PeakSeuratFromTransfer(peak.data = peak.counts, genes.seurat = published.seurat_object, annot.info = peak.annotations, min.cells = 0, min.peaks = 0) Gives instead: 394 peaks and 56902 cells

I don't understand why the two functions give 2 different outputs.

Thanks!

rj-patrick commented 2 years ago

Hi @cag1343,

Apologies for the slow response. Regarding the cell numbers, do you mean that there were originally 170,706 cells, and this is retained with NewPeakSeurat, but after running the PeakSeuratFromTransfer you are left with 56902? I'm assuming you are working with multiple runs/experiments that have been combined together? The likely explanation for the 'missing' cells is that there is a discrepancy with the cell barcode suffix. Sierra by default follows the CellRanger strategy of appending '-1', '-2' etc. to cell barcodes according to the order of datasets that are provided to the AggregatePeakCounts function. If this results in suffixes that are different to what is in your Seurat object those cells will be excluded.

Regarding peak numbers, 394 peaks is definitely unexpected. Are you using the same reference file that was used for the alignment of the BAM files and that generated the 17,473 features?

Cheers, Ralph