GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

simpleError in H5Lcreate_external when creating Arrow Files #352

Closed PeggySze closed 3 years ago

PeggySze commented 4 years ago

Hi I encountered an error when creating arrow files. The following is the error information :

2020-10-19 12:13:39 : ERROR Found in .tabixToTmp for (atac_3dpa : 3 of 3) 
LogFile = ArchRLogs/ArchR-createArrows-2f9331ab83c12-Date-2020-10-19_Time-10-03-34.log

<simpleError in H5Lcreate_external(target_file_name = tmpChrFilei, target_obj_name = h5ls(tmpChrFilei)$name[2],     link_loc 
= fid, link_name = paste0("Fragments/", chunkName[1],         "/", group[2])): HDF5. Links. Unable to initialize object.>

2020-10-19 12:13:39 : errorList, Class = list

And here is the information of my R session :

R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
[1] C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome.Smediterranea.PlanMine.ddSmedG4_1.0.0
 [2] BSgenome_1.54.0                               
 [3] rtracklayer_1.46.0                            
 [4] Biostrings_2.54.0                             
 [5] XVector_0.26.0                                
 [6] ArchR_0.9.4                                   
 [7] magrittr_1.5                                  
 [8] rhdf5_2.30.1                                  
 [9] Matrix_1.2-18                                 
[10] data.table_1.13.0                             
[11] SummarizedExperiment_1.16.1                   
[12] DelayedArray_0.12.3                           
[13] BiocParallel_1.20.1                           
[14] matrixStats_0.57.0                            
[15] Biobase_2.46.0                                
[16] ggplot2_3.3.2                                 
[17] GenomicRanges_1.38.0                          
[18] GenomeInfoDb_1.22.0                           
[19] IRanges_2.20.2                                
[20] S4Vectors_0.24.4                              
[21] BiocGenerics_0.34.0                           

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3               pillar_1.4.6             compiler_3.6.0          
 [4] bitops_1.0-6             tools_3.6.0              zlibbioc_1.32.0         
 [7] lifecycle_0.2.0          tibble_3.0.4             gtable_0.3.0            
[10] lattice_0.20-38          pkgconfig_2.0.3          rlang_0.4.8             
[13] GenomeInfoDbData_1.2.2   stringr_1.4.0            withr_2.1.2             
[16] dplyr_1.0.2              generics_0.0.2           vctrs_0.3.4             
[19] grid_3.6.0               tidyselect_1.1.0         glue_1.4.2              
[22] R6_2.4.1                 XML_3.99-0.3             Rhdf5lib_1.8.0          
[25] purrr_0.3.3              GenomicAlignments_1.22.1 Rsamtools_2.2.3         
[28] scales_1.1.0             ellipsis_0.3.0           colorspace_1.4-1        
[31] stringi_1.4.3            RCurl_1.95-4.12          munsell_0.5.0           
[34] crayon_1.3.4            

I also attach my ArchRLogs. ArchR-createArrows-2f9331ab83c12-Date-2020-10-19_Time-10-03-34.log Does anyone know how to fix this problem ? Thanks a lot!

rcorces commented 4 years ago

Can you recapitulate this error with the tutorial dataset?

jgranja24 commented 3 years ago

Hi @PeggySze, I am guessing this is a subThreading problem related to your OS. Please try subThreading = FALSE or threads = 1. I believe this will solve your issue.

Best

Jeff

angleYuan commented 3 years ago

Hi everyone, I encountered the same multi-thread issue when running my data but not the tutorial data on the same server, so I am not sure if it is a problem with my OS. Indeed I was able to bypass this issue with my data by setting threads=1 but it becomes really slow (too difficult for trouble-shooting other problems...). I attached the log files for both my data (threads=16) and the tutorial data (threads=16). Do you have any idea why I can not use multi-thread on my data? It will be great if multi-thread will be an option for me. Thanks!

thread16_mydata_ArchR-createArrows-86215ce9e2e3-Date-2020-11-03_Time-20-59-56.log thread16_tutorial_ArchR-createArrows-d86a792b956b-Date-2020-11-04_Time-10-09-48.log

rcorces commented 3 years ago

@angleYuan - have you tried using subThreading = FALSE instead of threads = 1?

angleYuan commented 3 years ago

Hi @rcorces , I only had one sample, so if I understand it correctly, subThreading = FALSE and threads = 1 was the same in my case.

rcorces commented 3 years ago

I dont believe that is the correct interpretation. I think parallelization happens on a per-chromosome basis when creating arrow files. I think one possible explanation for the issues with HDF5 files is that some systems use subThreading differently or dont allow subThreading (for ex. cluster environments). Please try it and let us know what happens.

angleYuan commented 3 years ago

Thank you for the explanation. I will give it try!

angleYuan commented 3 years ago

Hi @rcorces it did work when I use multiple threads (n=16) and set subThreading to False. Thanks for your suggestions, although I have to say the help in processing time is not very significant but at least something.

hcph commented 3 years ago

I met the same issue. However, when run tutorial dataset, it is successful even set the threads =16. While when run myself data, threads = 1 and subThreading = FALSE not work.

hcph commented 3 years ago

I fix this issue when using threads = 1 and subThreading = FALSE together, thanks

drowsygoat commented 5 months ago

Anyone encountered this error on MacOS (I have)?