GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
388 stars 141 forks source link

Error when subsetArchRProject #1880

Open fe4960 opened 1 year ago

fe4960 commented 1 year ago

Hello,

Attach your log file This step doesn't generate a log file.

Describe the bug I tried to subset a ArchR project with the code below:

proj2=subsetArchRProject( ArchRProj = proj1, cells = final_cell, outputDirectory = dir, dropCells = TRUE)

It generated the error below and didn't went through:

Copying ArchRProject to new outputDirectory : human_meta/data/proj4_clean1_final Copying Arrow Files... Error in .safelapply(seq_along(inArrows), function(x) { : Error Found Iteration 51 : [1] "Error in [.data.frame(.h5read(inArrow, h5name), idxKeep) : \n undefined columns selected\n" <simpleError in [.data.frame(.h5read(inArrow, h5name), idxKeep): undefined columns selected> Calls: subsetArchRProject ... saveArchRProject -> .copyArrows -> unlist -> .safelapply In addition: Warning message: In mclapply(..., mc.cores = threads, mc.preschedule = preschedule) : 1 function calls resulted in an error Execution halted

I used the latest version of ArchR v1.0.3, as v1.0.2 generated other error. I used the same code with v1.0.3 to run through other datasets and worked well. I wonder what causes the error in this dataset and if you can help fix it. Thanks a lot!

I searched the previous issues and found this error has not been solved before.

Session Info R version 4.1.0 (2021-05-18)

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 grid stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] rhdf5_2.38.1 SummarizedExperiment_1.24.0 [3] Biobase_2.54.0 RcppArmadillo_0.11.0.0.0
[5] Rcpp_1.0.9 Matrix_1.5-3
[7] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[9] IRanges_2.28.0 S4Vectors_0.32.4
[11] BiocGenerics_0.40.0 sparseMatrixStats_1.6.0
[13] MatrixGenerics_1.6.0 matrixStats_0.63.0
[15] data.table_1.14.6 stringr_1.5.0
[17] plyr_1.8.8 magrittr_2.0.3
[19] ggplot2_3.4.0 gtable_0.3.1
[21] gtools_3.9.4 gridExtra_2.3
[23] devtools_2.4.4 usethis_2.1.6
[25] ArchR_1.0.3

loaded via a namespace (and not attached): [1] pkgload_1.3.0 shiny_1.7.1 assertthat_0.2.1
[4] GenomeInfoDbData_1.2.7 remotes_2.4.2 sessioninfo_1.2.2
[7] pillar_1.7.0 lattice_0.20-45 glue_1.6.2
[10] digest_0.6.30 promises_1.2.0.1 XVector_0.34.0
[13] colorspace_2.0-3 htmltools_0.5.2 httpuv_1.6.5
[16] pkgconfig_2.0.3 zlibbioc_1.40.0 purrr_0.3.4
[19] xtable_1.8-4 scales_1.2.1 processx_3.5.3
[22] later_1.3.0 tibble_3.1.6 generics_0.1.2
[25] ellipsis_0.3.2 cachem_1.0.6 withr_2.5.0
[28] cli_3.5.0 crayon_1.5.2 mime_0.12
[31] memoise_2.0.1 ps_1.6.0 fs_1.5.2
[34] fansi_1.0.3 pkgbuild_1.3.1 profvis_0.3.7
[37] tools_4.1.0 prettyunits_1.1.1 lifecycle_1.0.3
[40] Rhdf5lib_1.16.0 munsell_0.5.0 DelayedArray_0.20.0
[43] callr_3.7.0 compiler_4.1.0 rlang_1.0.6
[46] RCurl_1.98-1.9 rhdf5filters_1.6.0 htmlwidgets_1.5.4
[49] miniUI_0.1.1.1 bitops_1.0-7 DBI_1.1.3
[52] R6_2.5.1 dplyr_1.0.8 fastmap_1.1.0
[55] utf8_1.2.2 stringi_1.7.8 parallel_4.1.0
[58] vctrs_0.5.1 tidyselect_1.1.2 urlchecker_1.0.1

rcorces commented 1 year ago

Hi @fe4960! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

rcorces commented 1 year ago

Not all requested information has been supplied. Closing due to inactivity.

rcorces commented 1 year ago

sorry, my above comment was meant for a different post. please ignore. will reply shortly

rcorces commented 1 year ago

Its hard for me to diagnose exactly why this is happening without the ability to reproduce the error. Is it possible for you to provide a reproducible example?

Its possible that there is a bug in the code where if one of your Arrow files doesnt have any cells represented in the cells param, then this error might happen. Could you show me the breakdown of your arrow files and the number of cells represented in each one?

fe4960 commented 1 year ago

Thanks a lot for the reply. Could you let me know the command to show the number of cells in the arrow files? Thanks!

rcorces commented 1 year ago

sorry for the delay. I would use table(). For example:

table(ArchRProj@cellColData$Sample)

You'll also need to show the number of cells per sample within the subset of cells you are looking at. For example:

table(ArchRProj@cellColData$Sample[which(getCellNames(ArchRProj) %in% final_cell)])
fe4960 commented 1 year ago

Thanks for the reply!

The cell number of each sample in the original ArchR object:

> table(proj1@cellColData$Sample)

          19_D003_lobe        19_D003_macular 19_D003_macular_NeuN_1 
                   103                    152                     40 
19_D003_macular_NeuN_2 19_D003_macular_NeuN_3           19_D005_lobe 
                    46                     56                     97 
       19_D005_macular           19_D006_lobe        19_D006_macular 
                   333                     90                    396 
19_D006_macular_NeuN_1 19_D006_macular_NeuN_2 19_D006_macular_NeuN_3 
                     5                      6                     12 
          19_D007_lobe        19_D007_macular           19_D008_lobe 
                   139                    176                    144 
       19_D008_macular           19_D009_lobe        19_D009_macular 
                   148                    161                    213 
          19_D010_lobe        19_D010_macular 19_D010_macular_NeuN_1 
                    38                    263                     23 
19_D010_macular_NeuN_2 19_D010_macular_NeuN_3           19_D011_lobe 
                    13                     14                     82 
       19_D011_macular           19_D019_lobe        19_D019_macular 
                   479                    129                    227 
19_D019_macular_NeuN_1 19_D019_macular_NeuN_2 19_D019_macular_NeuN_3 
                     3                      2                      1 
          19D013_fovea         19D013_macular           19D014_fovea 
                   188                    547                    388 
        19D016_macular           D005_13_lobe        D005_13_macular 
                   691                     72                    123 
          D009_13_lobe        D009_13_macular           D013_13_lobe 
                    39                    232                    136 
       D013_13_macular           D017_13_lobe        D017_13_macular 
                   357                     57                    180 
          D018_13_lobe        D018_13_macular           D019_13_lobe 
                   100                    248                    125 
       D019_13_macular           D021_13_lobe        D021_13_macular 
                   511                     59                    258 
          D026_13_lobe        D026_13_macular           D027_13_lobe 
                    90                    154                     83 
       D027_13_macular           D028_13_lobe        D028_13_macular 
                   262                    112                    175 
D028_13_macular_NeuN_1 D028_13_macular_NeuN_2           D030_13_lobe 
                     1                      1                     89 
       D030_13_macular         GSM5567523_Hu5         GSM5567524_Hu7 
                    41                    128                    125 
        GSM5567533_Hu8 
                   180 

The number of cells per sample within the subset of cells:

> table(proj1@cellColData$Sample[which(getCellNames(proj1) %in% final_cell)])

          19_D003_lobe        19_D003_macular 19_D003_macular_NeuN_1 
                   103                    152                     40 
19_D003_macular_NeuN_2 19_D003_macular_NeuN_3           19_D005_lobe 
                    46                     56                     97 
       19_D005_macular           19_D006_lobe        19_D006_macular 
                   332                     90                    396 
19_D006_macular_NeuN_1 19_D006_macular_NeuN_2 19_D006_macular_NeuN_3 
                     5                      6                     12 
          19_D007_lobe        19_D007_macular           19_D008_lobe 
                   139                    176                    144 
       19_D008_macular           19_D009_lobe        19_D009_macular 
                   148                    161                    213 
          19_D010_lobe        19_D010_macular 19_D010_macular_NeuN_1 
                    38                    262                     23 
19_D010_macular_NeuN_2 19_D010_macular_NeuN_3           19_D011_lobe 
                    13                     14                     82 
       19_D011_macular           19_D019_lobe        19_D019_macular 
                   479                    129                    227 
19_D019_macular_NeuN_1 19_D019_macular_NeuN_2           19D013_fovea 
                     3                      2                    188 
        19D013_macular           19D014_fovea         19D016_macular 
                   547                    388                    691 
          D005_13_lobe        D005_13_macular           D009_13_lobe 
                    72                    123                     39 
       D009_13_macular           D013_13_lobe        D013_13_macular 
                   231                    136                    357 
          D017_13_lobe        D017_13_macular           D018_13_lobe 
                    57                    180                    100 
       D018_13_macular           D019_13_lobe        D019_13_macular 
                   223                    125                    504 
          D021_13_lobe        D021_13_macular           D026_13_lobe 
                    58                    258                     90 
       D026_13_macular           D027_13_lobe        D027_13_macular 
                   153                     83                    262 
          D028_13_lobe        D028_13_macular           D030_13_lobe 
                   112                    174                     89 
       D030_13_macular         GSM5567523_Hu5         GSM5567524_Hu7 
                    40                    128                    125 
        GSM5567533_Hu8 
                   180 

The file size of 19_D010_macular_NeuN_1.arrow is 37015914, much smaller compared to other arrow files. I don't know if it is the file causing error.

rcorces commented 1 year ago

This does not have to do with what I suggested previously. subsetArchRProject() behaves as expected when you subset and have Arrow files that lack cells etc. Thus far, I'm unable to recapitulate this error on the tutorial data.

I've taken my best guess at a solution, based solely on your error message. That change has been implemented on the dev_idxKeep branch. Please test this out by installing that branch as indicated below. And please report back on the outcome.

devtools::install_github("GreenleafLab/ArchR", ref="dev_idxKeep", repos = BiocManager::repositories(), upgrade = "never")
#to unload a package and reload
detach("package:ArchR", unload=TRUE)
library(ArchR)
fe4960 commented 1 year ago

Thanks for the help. I installed the "dev_idxKeep" branch. It still shows error.

proj2=subsetArchRProject(

  • ArchRProj = proj1,
  • cells = final_cell,
  • outputDirectory = dir,
  • dropCells = TRUE) Copying ArchRProject to new outputDirectory : /storage/chenlab/Users/junwang/human_meta/data/proj4_clean1_HC_final Copying Arrow Files... Error in .safelapply(seq_along(inArrows), function(x) { : Error Found Iteration 1 : [1] "Error in .h5read(inArrow, h5name)[idxKeep, ] : \n incorrect number of dimensions\n" <simpleError in .h5read(inArrow, h5name)[idxKeep, ]: incorrect number of dimensions> Error Found Iteration 2 : [1] "Error in .h5read(inArrow, h5name)[idxKeep, ] : \n incorrect number of dimensions\n" <simpleError in .h5read(inArrow, h5name)[idxKeep, ]: incorrect number of dimensions> Error Found Iteration 3 : [1] "Error in .h5read(inArrow, h5name)[idxKeep, ] : \n incorrect number of dimensions\n" <simpleError in .h5read(inArrow, h5name)[idxKeep, ]: incorrect number of dimensions> Error Found Iteration 4 : [1] "Error in .h5read(inArrow, h5name)[idxKeep, ] : \n incorrect number of dimensions\n" <simpleError in .h5read(inArrow, h5name)[idxKeep, ]: incorrect number of dimensions> Error Found Iteration 5 : [1] "Error in .h5read(inArrow, h5name)[idxKeep, ] : \n incorrect number of dimensions\n" <simpleError in .h5read(inArrow, h5name)[idxKeep, ]: in In addition: Warning message: In mclapply(..., mc.cores = threads, mc.preschedule = preschedule) : 58 function calls resulted in an error
rcorces commented 1 year ago

Sorry, maybe I got the column/row order incorrect. I made another change. could you re-install dev_idxKeep and try one more time?

devtools::install_github("GreenleafLab/ArchR", ref="dev_idxKeep", repos = BiocManager::repositories(), upgrade = "never")
#to unload a package and reload
detach("package:ArchR", unload=TRUE)
library(ArchR)
fe4960 commented 1 year ago

I have re-installed dev_idxKeep batch. It shows the error info below. Could you help fix it? Thanks a lot!

proj2=subsetArchRProject( ArchRProj = proj1, cells = final_cell, outputDirectory = dir, dropCells = TRUE) Copying ArchRProject to new outputDirectory : /storage/chenlab/Users/junwang/human_meta/data/proj4_clean1_HC_final Copying Arrow Files... Error in .safelapply(seq_along(inArrows), function(x) { : Error Found Iteration 1 : [1] "Error in .h5read(inArrow, h5name)[, idxKeep] : \n incorrect number of dimensions\n" <simpleError in .h5read(inArrow, h5name)[, idxKeep]: incorrect number of dimensions> Error Found Iteration 2 : [1] "Error in .h5read(inArrow, h5name)[, idxKeep] : \n incorrect number of dimensions\n" <simpleError in .h5read(inArrow, h5name)[, idxKeep]: incorrect number of dimensions> Error Found Iteration 3 : [1] "Error in .h5read(inArrow, h5name)[, idxKeep] : \n incorrect number of dimensions\n" <simpleError in .h5read(inArrow, h5name)[, idxKeep]: incorrect number of dimensions> Error Found Iteration 4 : [1] "Error in .h5read(inArrow, h5name)[, idxKeep] : \n incorrect number of dimensions\n" <simpleError in .h5read(inArrow, h5name)[, idxKeep]: incorrect number of dimensions> Error Found Iteration 5 : [1] "Error in .h5read(inArrow, h5name)[, idxKeep] : \n incorrect number of dimensions\n" <simpleError in .h5read(inArrow, h5name)[, idxKeep]: in In addition: Warning message: In mclapply(..., mc.cores = threads, mc.preschedule = preschedule) : 58 function calls resulted in an error

rcorces commented 1 year ago

As my blind attempts to fix this have failed, I dont have much more to offer at this time. I think you will need to manually step through the code and figure out what the value is for idxKeep during the iteration where it is failing.