VCCRI / Sierra

Discover differential transcript usage from polyA-captured single cell RNA-seq data
GNU General Public License v3.0
49 stars 17 forks source link

Coordinates of peaks across junctions after merging #67

Closed chvandijk closed 11 months ago

chvandijk commented 11 months ago

Dear,

I've been using the pipeline in the vignette. After merging peak files generated from individual samples in a single cell RNA-seq experiment I noticed the following line in the merged peak file:

SLC4A8 chr12 1 51424378 51458152 SLC4A8:chr12:51424378-51458152:1 Merged SLC4A8:chr12:51424684-51457903:1 SAMPLE1 across-junctions (51424723,51425034)(51440708,51440788)(51450876,51451021)(51452124,51452258)(51453539,51453698)(51457351,51457903) Whereby the start site of the peak (51424378) is upstream of the first exon (51424723,51425034) and the end site of the peak (51458152) is downstream of the last exon (51457351,51457903). Does this mean that the peak also spans introns? And what coordinates are used for UMI counting for this peak?

Thanks!

rj-patrick commented 11 months ago

Hi @chvandijk,

Yes, the peak calling in Sierra takes splice junctions into account, so peaks can span introns. The UMI counting is performed on the peak coordinates themselves.

Cheers, Ralph

chvandijk commented 11 months ago

Hi Ralph,

Thanks for the reply! I have another quick question: I'm using replicates instead of populations and when I was browsing through the code of the DUtest() function I noticed that when making the pseudo bulk count data the function only sums the counts. I have different number of cells for every replication, so is there a way to correct for this? Otherwise, I think I will be looking at cell number differences instead of expression differences.

Specifically, this line of code:

 ## create a profile set for first cluster
  profile.set1 = matrix(, nrow = length(peaks.use), ncol = length(cell.sets1))
  for (i in 1:length(cell.sets1)) {
    this.set <- cell.sets1[[i]]
    sub.matrix <- Seurat::GetAssayData(apa.seurat.object, slot = "counts", assay = "RNA")[peaks.use, this.set]
    if (length(this.set) > 1) {
      this.profile <- as.numeric(apply(sub.matrix, 1, function(x) sum(x)))
      profile.set1[, i] <- this.profile
    } else {
      profile.set1[, i] <- sub.matrix
    }
  }

from the apply_DEXSeq_test_seurat function

Best, Charlotte

rj-patrick commented 11 months ago

Hi Charlotte,

The pseudobulk counts will be normalised as part of the DEXSeq testing, so unless some of your replicates have every low cell numbers, you should be fine.

Cheers, Ralph

chvandijk commented 11 months ago

Thanks!

rj-patrick commented 11 months ago

I'll close this issue now, but if you have any more problems feel free to reopen.