GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
385 stars 137 forks source link

ArchR returning empty character(0) for ArrowFiles #1648

Closed yojetsharma closed 2 years ago

yojetsharma commented 2 years ago

I am using my dataset for ArchR analyses. These are scATAC seq dataset. I am able to run through the initial steps of creation of arrow files but it seems to be stuck at "Creating ArrowFile From Temporary File 9.113 mins elapsed" for over two hours. And then once it finishes, it gives an empty character (0) for ArrowFiles.

> library(ArchR)

                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
        ,--' ,----`-,__ ___/'  --,-`-===================##========>
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_)  |    |  ,----'|  |__|  | |  |_)  |    
        /  /_\  \   |      /     |  |     |   __   | |      /     
       /  _____  \  |  |\  \\___ |  `----.|  |  |  | |  |\  \\___.
      /__/     \__\ | _| `._____| \______||__|  |__| | _| `._____|

ArchR : Version 1.0.2
For more information see our website : www.ArchRProject.com
If you encounter a bug please report : https://github.com/GreenleafLab/ArchR/issues
Loading Required Packages...
    Loading Package : grid v4.1.2
    Loading Package : gridExtra v2.3
    Loading Package : gtools v3.9.3
    Loading Package : gtable v0.3.1
    Loading Package : ggplot2 v3.3.6
    Loading Package : magrittr v2.0.3
    Loading Package : plyr v1.8.7
    Loading Package : stringr v1.4.1
    Loading Package : data.table v1.14.2
    Loading Package : matrixStats v0.62.0
    Loading Package : S4Vectors v0.32.4
    Loading Package : GenomicRanges v1.46.1
    Loading Package : BiocGenerics v0.40.0
    Loading Package : Matrix v1.5.1
    Loading Package : Rcpp v1.0.9
    Loading Package : SummarizedExperiment v1.24.0
    Loading Package : rhdf5 v2.38.1
Setting default number of Parallel threads to 16.
> inputFiles = getInputFiles("/home/newfolder/outs/")
> inputFiles
                                                            possorted 
"/home/newfolder/outs//possorted_bam.bam"
> addArchRGenome("hg38")
Setting default genome to Hg38.
> addArchRThreads(threads = 16) 
Setting default number of Parallel threads to 16.
>  ArrowFiles <- createArrowFiles(inputFiles = inputFiles,sampleNames = names(inputFiles),minTSS = 4,minFrags = 1000,addTileMat = TRUE,addGeneScoreMat = TRUE)
Using GeneAnnotation set by addArchRGenome(Hg38)!
Using GeneAnnotation set by addArchRGenome(Hg38)!
ArchR logging to : ArchRLogs/ArchR-createArrows-3e077720032e-Date-2022-09-28_Time-11-16-36.log
If there is an issue, please report to github with logFile!
2022-09-28 11:16:36 : Batch Execution w/ safelapply!, 0 mins elapsed.
2022-09-28 11:16:36 : (possorted : 1 of 1) Arrow Exists! Overriding since not completed!, 0.001 mins elapsed.
2022-09-28 11:16:36 : (possorted : 1 of 1) Reading In Fragments from inputFiles (readMethod = bam), 0.001 mins elapsed.
2022-09-28 11:16:36 : (possorted : 1 of 1) Tabix Bam To Temporary File, 0.002 mins elapsed.
2022-09-28 11:19:31 : (possorted : 1 of 1) Reading BamFile 28 Percent, 2.92 mins elapsed.
2022-09-28 11:22:19 : (possorted : 1 of 1) Reading BamFile 55 Percent, 5.722 mins elapsed.
2022-09-28 11:24:07 : (possorted : 1 of 1) Reading BamFile 82 Percent, 7.523 mins elapsed.
2022-09-28 11:25:43 : (possorted : 1 of 1) Successful creation of Temporary File, 9.113 mins elapsed.
2022-09-28 11:25:43 : (possorted : 1 of 1) Creating ArrowFile From Temporary File, 9.113 mins elapsed.
2022-09-28 12:52:52 : ERROR Found in .tmpToArrow for (possorted : 1 of 1) 
LogFile = ArchRLogs/ArchR-createArrows-3e077720032e-Date-2022-09-28_Time-11-16-36.log

<simpleError: Detected 2 or less cells (0 barcodes have greater than 50 fragments) in file!
       Check inputs such as 'minFrags' or 'maxFrags' to keep cells!
       Also check that you are using the correct reference genome.
       Exiting!
************************************************************

2022-09-28 12:52:52 : createArrowFiles has encountered an error, checking if any ArrowFiles completed..
ArchR logging successful to : ArchRLogs/ArchR-createArrows-3e077720032e-Date-2022-09-28_Time-11-16-36.log
> ArrowFiles
character(0)
> traceback()
3: gc()
2: closeAllConnections()
1: sys.save.image(".RData")
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /ncbs_gs/nlsas_data/usershares/praghu/yojetsharma/.conda/envs/scenic/lib/libopenblasp-r0.3.21.so

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] Rsamtools_2.10.0                  BSgenome.Hsapiens.UCSC.hg38_1.4.4
 [3] BSgenome_1.62.0                   rtracklayer_1.54.0               
 [5] Biostrings_2.62.0                 XVector_0.34.0                   
 [7] rhdf5_2.38.1                      SummarizedExperiment_1.24.0      
 [9] Biobase_2.54.0                    MatrixGenerics_1.6.0             
[11] Rcpp_1.0.9                        Matrix_1.5-1                     
[13] GenomicRanges_1.46.1              GenomeInfoDb_1.30.1              
[15] IRanges_2.28.0                    S4Vectors_0.32.4                 
[17] BiocGenerics_0.40.0               matrixStats_0.62.0               
[19] data.table_1.14.2                 stringr_1.4.1                    
[21] plyr_1.8.7                        magrittr_2.0.3                   
[23] ggplot2_3.3.6                     gtable_0.3.1                     
[25] gtools_3.9.3                      gridExtra_2.3                    
[27] ArchR_1.0.2                      

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.2         purrr_0.3.4              lattice_0.20-45         
 [4] colorspace_2.0-3         vctrs_0.4.1              generics_0.1.3          
 [7] yaml_2.3.5               XML_3.99-0.10            utf8_1.2.2              
[10] rlang_1.0.6              pillar_1.8.1             glue_1.6.2              
[13] withr_2.5.0              DBI_1.1.3                BiocParallel_1.28.3     
[16] GenomeInfoDbData_1.2.7   lifecycle_1.0.2          zlibbioc_1.40.0         
[19] munsell_0.5.0            restfulr_0.0.15          Cairo_1.6-0             
[22] fansi_1.0.3              scales_1.2.1             DelayedArray_0.20.0     
[25] rjson_0.2.21             stringi_1.7.8            dplyr_1.0.10            
[28] BiocIO_1.4.0             cli_3.4.1                tools_4.1.2             
[31] bitops_1.0-7             rhdf5filters_1.6.0       RCurl_1.98-1.8          
[34] tibble_3.1.8             crayon_1.5.1             pkgconfig_2.0.3         
[37] assertthat_0.2.1         Rhdf5lib_1.16.0          R6_2.5.1                
[40] GenomicAlignments_1.30.0 compiler_4.1.2
>

ArchR-createArrows-3e077720032e-Date-2022-09-28_Time-11-16-36.log

rcorces commented 2 years ago

Hi @yojetsharma! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

rcorces commented 2 years ago

You are using BAM files as input which require additional specification. Your error indicates that ArchR cannot find the information corresponding to the cell barcodes. Please see the BAM-related parameter documentation for createArrowFiles(). https://www.archrproject.com/reference/createArrowFiles.html

<simpleError: Detected 2 or less cells (0 barcodes have greater than 50 fragments) in file!
       Check inputs such as 'minFrags' or 'maxFrags' to keep cells!
       Also check that you are using the correct reference genome.
       Exiting!
yojetsharma commented 2 years ago

Worked! Thank you!