GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
365 stars 129 forks source link

Subset the peak matrix or filter out peaks from the ArchR project. #549

Closed miaomiaotao9 closed 3 years ago

miaomiaotao9 commented 3 years ago

Describe the problem that your feature request would address. Hello, I was wondering is there any possibility that I can subset the peak matrix or filter out some peaks for the ArchR project.

Describe the solution you'd like I have bulk ATAC data, with these data I generated a list of peaks of interest. I want to subset the single-cell ATAC-seq peak using these data. Basically, I want to keep the peaks only have overlap with my own peak list, and using these new peak matrix to do the downstream analysis.

I have met errors when I use this new matrix to add background peaks.

Here is the code I am using:

projHeme5 <- addPeakMatrix(projHeme4) # ARchR project after peak calling

sc_gr <- projHeme5@peakSet #peak set from the single cell data bulk_gr <- GRanges(bulk_matrix) #peat set from my bulk data

bulk_sc_overlap <- subsetByOverlaps.keepAllMeta(bulk_gr,sc_gr) # find overlaps

add new matrix to the ArchR project

projHeme6 <- addPeakSet(

ArchRProj = projHeme4, peakSet = bulk_sc_overlap, genomeAnnotation = getGenomeAnnotation(projHeme4), force = TRUE )

add annotation

projHeme6 <- addMotifAnnotations(ArchRProj = projHeme6, motifSet = "cisbp", name = "Motif",force = TRUE)

However, when I tried to add background peaks, I got an error like this:

projHeme6 <- addBgdPeaks(projHeme6,method = "chromVAR")

Identifying Background Peaks! Error in [[<-(*tmp*, name, value = c(773682L, 845461L, 962060L, 974946L, : 24531 elements in value to replace 181664 elements

I also tried to use "ArchR" method, but I got the same error.

projHeme6 <- addBgdPeaks(projHeme6,method = "ArchR") Identifying Background Peaks! Error in [[<-(*tmp*, name, value = c(773682L, 845461L, 962060L, 974946L, : 24531 elements in value to replace 181664 elements

I was wondering is there any possibility that I can subset the matrix and filter out the peaks that are not on my list?

Thank you all in advance.

session_info()

─ Session info setting value
version R version 4.0.3 (2020-10-10) os Ubuntu 18.04.5 LTS
system x86_64, linux-gnu
ui RStudio
language en_GB:en
collate en_GB.UTF-8
ctype en_GB.UTF-8
tz Europe/London
date 2021-02-24

─Packages package version date lib source
annotate 1.68.0 2020-10-27 [1] Bioconductor
AnnotationDbi 1.52.0 2020-10-27 [1] Bioconductor
ArchR
1.0.1 2021-02-09 [1] Github (GreenleafLab/ArchR@023bcb0)

rcorces commented 3 years ago

I think this should work the way you have done it. Could you create a minimal reproducible example? From your code it looks like you are getting this error with the tutorial data set? If so, can you create a minimal GRanges object for your "bulk" peaks so that we can recapitulate this behavior on our end?

jgranja24 commented 3 years ago

Hi @miaomiaotao9, my guess is that you already ran addPeakMatrix with the previous peak set. Background peak computation will take in the average accessibility across all peaks and this will have a different length hence the error. I think if you run addPeakMatrix(force=TRUE) it will fix your problem!

miaomiaotao9 commented 3 years ago

Hi, thank you very much for your help! The problem was fixed after I run addPeakMatrix(force=TRUE) to the new ArchR project.

rcorces commented 3 years ago

I think it would be ideal to improve the error catching here because the error is quite opaque.

miaomiaotao9 commented 3 years ago

Hi, I can provide the bulk peak list. The single-cell data was from the paper Granja* et al. Nature Biotechnology 2019, sample GSM4138891_scATAC_CD34_D8T1.

bulk_subset_matrix.tsv.gz

jgranja24 commented 3 years ago

This should be now handled in the newest release.