GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

projectBulkATAC error #111

Closed badoi closed 4 years ago

badoi commented 4 years ago

I am projecting a RangeSummarizedExperiment of bulk ATAC-seq onto the snATAC-seq project. Running with threads=1 or with automatic detection gets two similar errors in one of the calls somewhere.

With 1 thread:

> projected = projectBulkATAC(
+   ArchRProj = projMouse,
+   seATAC = rse,
+   reducedDims = "IterativeLSI",
+   embedding = "UMAP", threads =1,
+   n = 250)
ArchR logging to : ArchRLogs/ArchR-projectBulkATAC-6afbe5e54c9bf-Date-2020-05-06_Time-03-40-44.log
If there is an issue, please report to github with logFile!
2020-05-06 03:40:45 : Overlap Ratio of Reduced Dims Features = 0.80708
2020-05-06 03:40:45 : Projecting Sample (1 of 20), 0.019 mins elapsed.
Error in `[.default`(summary(as(simMat, "dgCMatrix")), , -1, drop = FALSE) : 
  incorrect number of dimensions

With multithread:

> projected = projectBulkATAC(
+   ArchRProj = projMouse,
+   seATAC = rse,
+   reducedDims = "IterativeLSI",
+   embedding = "UMAP", threads =20,
+   n = 250)
ArchR logging to : ArchRLogs/ArchR-projectBulkATAC-6afbe3e33954a-Date-2020-05-06_Time-03-41-10.log
If there is an issue, please report to github with logFile!
2020-05-06 03:41:11 : Overlap Ratio of Reduced Dims Features = 0.80708
2020-05-06 03:41:12 : Projecting Sample (1 of 20), 0.024 mins elapsed.
2020-05-06 03:41:12 : Projecting Sample (2 of 20), 0.031 mins elapsed.
2020-05-06 03:41:13 : Projecting Sample (3 of 20), 0.038 mins elapsed.
2020-05-06 03:41:13 : Projecting Sample (4 of 20), 0.046 mins elapsed.
2020-05-06 03:41:13 : Projecting Sample (5 of 20), 0.053 mins elapsed.
2020-05-06 03:41:14 : Projecting Sample (6 of 20), 0.06 mins elapsed.
2020-05-06 03:41:14 : Projecting Sample (7 of 20), 0.068 mins elapsed.
2020-05-06 03:41:15 : Projecting Sample (8 of 20), 0.076 mins elapsed.
2020-05-06 03:41:15 : Projecting Sample (9 of 20), 0.084 mins elapsed.
2020-05-06 03:41:16 : Projecting Sample (10 of 20), 0.091 mins elapsed.
2020-05-06 03:41:16 : Projecting Sample (11 of 20), 0.099 mins elapsed.
2020-05-06 03:41:17 : Projecting Sample (12 of 20), 0.107 mins elapsed.
2020-05-06 03:41:17 : Projecting Sample (13 of 20), 0.114 mins elapsed.
2020-05-06 03:41:18 : Projecting Sample (14 of 20), 0.121 mins elapsed.
2020-05-06 03:41:18 : Projecting Sample (15 of 20), 0.129 mins elapsed.
2020-05-06 03:41:19 : Projecting Sample (16 of 20), 0.137 mins elapsed.
2020-05-06 03:41:19 : Projecting Sample (17 of 20), 0.144 mins elapsed.
2020-05-06 03:41:19 : Projecting Sample (18 of 20), 0.152 mins elapsed.
2020-05-06 03:41:20 : Projecting Sample (19 of 20), 0.16 mins elapsed.
2020-05-06 03:41:21 : Projecting Sample (20 of 20), 0.175 mins elapsed.
Error in .safelapply(seq_len(ncol(bulkMat)), function(x) { : 
Error Found Iteration 1 : 
        [1] "Error in `[.default`(summary(as(simMat, \"dgCMatrix\")), , -1, drop = FALSE) : \n  incorrect number of dimensions\n"
        <simpleError in `[.default`(summary(as(simMat, "dgCMatrix")), , -1, drop = FALSE): incorrect number of dimensions>
Error Found Iteration 2 : 
        [1] "Error in `[.default`(summary(as(simMat, \"dgCMatrix\")), , -1, drop = FALSE) : \n  incorrect number of dimensions\n"
        <simpleError in `[.default`(summary(as(simMat, "dgCMatrix")), , -1, drop = FALSE): incorrect number of dimensions>
Error Found Iteration 3 : 
        [1] "Error in `[.default`(summary(as(simMat, \"dgCMatrix\")), , -1, drop = FALSE) : \n  incorrect number of dimensions\n"
        <simpleError in `[.default`(summary(as(simMat, "dgCMatrix")), , -1, drop = FALSE): incorrect number of dimensions>
Error Found Iteration 4 : 
        [1] "Error in `[.default`(summary(as(simMat, \"dgCMatrix\")), , -1, drop = FALSE) : \n  incorrect number of dimensions\n"
        <simpleError in `[.default`(summary(as(s
In addition: Warning message:
In mclapply(..., mc.cores = threads, mc.preschedule = preschedule) :
  20 function calls resulted in an error

I poked around the BulkProjection.R script and somehow wasn't able to recreate the error and got the projections by hand. I think there's some issues in creating the simRD data frame in line

...
simMat <- summary(as(simMat, "dgCMatrix"))[,-1,drop=FALSE]
...

Session Info

R version 3.6.3 (2020-02-29)                                                                                                       
Platform: x86_64-conda_cos6-linux-gnu (64-bit)                                                                                     
Running under: CentOS Linux 7 (Core)                                                                                               

Matrix products: default                                                                                                           
BLAS/LAPACK: /home/bnphan/miniconda3/lib/libopenblasp-r0.3.9.so                                                                    

locale:                                                                                                                            
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C                                                                                       
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8                                                                             
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8                                                                            
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                                                                                          
 [9] LC_ADDRESS=C               LC_TELEPHONE=C                                                                                     
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C                                                                                

attached base packages:                                                                                                            
 [1] grid      parallel  stats4    stats     graphics  grDevices utils                                                             
 [8] datasets  methods   base                                                                                                      

other attached packages:                                                                                                           
 [1] chromVARmotifs_0.2.0               motifmatchr_1.8.0                                                                          
 [3] gridExtra_2.3                      BSgenome.Mmusculus.UCSC.mm10_1.4.0                                                         
 [5] BSgenome_1.54.0                    rtracklayer_1.46.0                                                                         
 [7] Biostrings_2.54.0                  XVector_0.26.0                                                                             
 [9] gtable_0.3.0                       Seurat_3.1.5                                                                               
[11] SingleCellExperiment_1.8.0         ArchR_0.9.3                                                                                
[13] magrittr_1.5                       rhdf5_2.30.1                                                                               
[15] Matrix_1.2-18                      data.table_1.12.8                                                                          
[17] SummarizedExperiment_1.16.1        DelayedArray_0.12.3                                                                        
[19] BiocParallel_1.20.1                matrixStats_0.56.0                                                                         
[21] Biobase_2.46.0                     GenomicRanges_1.38.0                                                                       
[23] GenomeInfoDb_1.22.1                IRanges_2.20.2                                                                             
[25] S4Vectors_0.24.4                   BiocGenerics_0.32.0                                                                        
[27] ggplot2_3.3.0                                                                                                                 

ArchR-projectBulkATAC-6afbe1a6132fd-Date-2020-05-06_Time-02-46-06.log ArchR-projectBulkATAC-6afbe62530b8d-Date-2020-05-06_Time-02-42-42.log

jgranja24 commented 4 years ago

Sigh, this must be a generics issue. Thanks for the info i am looking into it. Its really weird how code works but once its in a package behaves differently.

jgranja24 commented 4 years ago

This is now functional in the new master update. See projectBulkATAC, we havent updated documentation as of yet however. But we tested it and now it seems to work on our end. Closing this issue feel free to reopen if further issues.

willey2020 commented 4 years ago

Hello! Could I ask how I should properly import bulk ATAC reads/bam files into the seATAC SummarizedExperiment object? Thank you very much!

badoi commented 4 years ago

You’ll need to define a set of genomic regions with a Bed file or GTF/gff file of your choice. I like using the featureCounts function from the Subread package to get a count matrix from bam files over a set of regions into an SE object.

willey2020 commented 4 years ago

You’ll need to define a set of genomic regions with a Bed file or GTF/gff file of your choice. I like using the featureCounts function from the Subread package to get a count matrix from bam files over a set of regions into an SE object.

Thank you so much BaDoi, I will give it a try right away following your suggestion.

willey2020 commented 3 years ago

@badoi Thank you! I have tried to prepare the peak count file and import into summarizedexperiments and it seems working in the projectBulkATAC process.

Dear Ryan @rcorces and Jeff @jgranja24 , could I ask a question regarding a line in the source code of projecting bulk ATAC, "Error incosistency found with matching LSI dimensions to those used in addEmbedding" during what situation this error will happen? in one of my archrproject, While preparing UMAP, I hope to remove several dims in LSI to remove sequencing bias, is this a possibility that this may generate this inconsistency? Thank you all!

rcorces commented 3 years ago

@willey2020 - I do not think this would cause the mentioned error but I have never tested this. But from what I can tell from glancing at the code, it should be very difficult to arrive at that error.

willey2020 commented 3 years ago

@rcorces Thank you so much! I will give it a try. Thank you again!