GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

saveArchRProject to a new directory doesn't update GroupCoverage file path metadata #529

Closed bnprks closed 2 years ago

bnprks commented 3 years ago

If I make a project and run saveArchRProject on it with a new directory, the GroupCoverage metadata still points to absolute file paths in the old project directory. Although the GroupCoverage files themselves got copied over, the new ArchRProject will fail to find the GroupCoverage data if I delete the old directory.

As a longer-term enhancement it would be nice to transition to using relative paths in ArchRProject objects so that it's possible to simply copy a project directory and get an independent copy of the project.

To reproduce:

proj # some ArchR Project  in "my_old_directory" with GroupCoverage data set on Clusters
proj2 <- saveArchRProject("my_new_directory")
proj2@projectMetadata$GroupCoverages$Clusters$coverageMetadata
#> DataFrame with 16 rows and 5 columns
#>           Group        Name                                                                            File    nCells nInsertions
#>     <character> <character>                                                                     <character> <integer>   <numeric>
#> 1            C1   C1._.Rep1 1    /my_old_directory/GroupCoverages/Clusters/C1._.Rep1.insertions.coverage.h5        25      174698
#> 2            C1   C1._.Rep2 2    /my_old_directory/GroupCoverages/Clusters/C1._.Rep2.insertions.coverage.h5        23      227994
#> 3            C2  C2._.lane1 3   /my_old_directory/GroupCoverages/Clusters/C2._.lane1.insertions.coverage.h5        64      518412
#> 4            C2  C2._.lane2 4   /my_old_directory/GroupCoverages/Clusters/C2._.lane2.insertions.coverage.h5        62      488366
#> 5            C3  C3._.lane2 5   /my_old_directory/GroupCoverages/Clusters/C3._.lane2.insertions.coverage.h5       192     1119588
#> ...         ...         ... ...                                                                         ...       ...         ...
#> 12           C6  C6._.lane2 12  /my_old_directory/GroupCoverages/Clusters/C6._.lane2.insertions.coverage.h5       187     2094356
#> 13           C7  C7._.lane2 13  /my_old_directory/GroupCoverages/Clusters/C7._.lane2.insertions.coverage.h5       417     4316490
#> 14           C7  C7._.lane1 14  /my_old_directory/GroupCoverages/Clusters/C7._.lane1.insertions.coverage.h5       376     3748336
#> 15           C8  C8._.lane2 15  /my_old_directory/GroupCoverages/Clusters/C8._.lane2.insertions.coverage.h5       368     2662700
#> 16           C8  C8._.lane1 16  /my_old_directory/GroupCoverages/Clusters/C8._.lane1.insertions.coverage.h5       337     2580612
rcorces commented 3 years ago

Thanks for reporting. I agree that relative paths would be more stable, especially considering how end users would share projects with each other. I would consider this a high-priority bug/enhancement. @jgranja24 - thoughts?

jgranja24 commented 3 years ago

Thanks for the bug report. ArchR stores as relative paths when saving, but when loaded it converts them to absolute. I will look into the coverages issue sometime soon.

cschmidl commented 3 years ago

I have the same problem - the paths to the groupcoverages are not updared after save-load.. Thanks for your help!

ankushs0128 commented 3 years ago

Following from #716 , I would request you to please take it up at your earliest possible convenience as it is hampering sharing the project data with collaborators.

Thanks in advance!

rcorces commented 2 years ago

This issue has now been addressed on dev and is slated for incorporation into a stable release_1.0.3 shortly. via https://github.com/GreenleafLab/ArchR/commit/293d20fcd199eb3964d8e6606fd70923622bfdde

Gavin-Yinld commented 1 year ago

Hello, I copy the ArchR output to another computer, while plotEmbedding function does not work due to the working directory sorted in ArchR object.

<simpleError in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE): no 'dimnames[[.]]': cannot use character indexing>

ArchR-plotEmbedding-6f082ce35ead-Date-2023-04-14_Time-15-42-09.log

evaham1 commented 1 year ago

Hello @jgranja24 @rcorces,

I am also getting the same problem that my GroupCoverage file path doesn't update. I have installed ArchR 1.0.3 from the dev branch using devtools::install_github('GreenleafLab/ArchR', ref='dev', repos = BiocManager::repositories()) however I still get the same error when I run saveArchRProject when I no longer have access to the old data directory.

Error message:

Error in saveArchRProject(ArchRProj = ArchR, outputDirectory = paste0(rds_path,  : 
  all(file.exists(zfiles)) is not TRUE

From inspecting ArchR@projectMetadata$GroupCoverages[[1]]$coverageMetadata$File when using ArchR_1.0.2 and ArchR_1.0.3 I can see that the coverage paths are still the same. Am I missing a step somewhere to update the group coverage paths?

Also I would like to iterate the above that it would be amazing to have this fixed incorporated into a stable release not just for sharing data but for running ArchR in custom pipelines.

Thanks for your help!

sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SeuratObject_4.1.3          Seurat_4.3.0                presto_1.0.0                pheatmap_1.0.12            
 [5] hexbin_1.28.2               GenomicFeatures_1.46.5      AnnotationDbi_1.56.2        forcats_0.5.1              
 [9] dplyr_1.1.2                 purrr_1.0.1                 readr_2.1.4                 tidyr_1.3.0                
[13] tibble_3.2.1                tidyverse_1.3.1             rhdf5_2.38.1                SummarizedExperiment_1.24.0
[17] Biobase_2.54.0              MatrixGenerics_1.6.0        Rcpp_1.0.10                 Matrix_1.5-4               
[21] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1         IRanges_2.28.0              S4Vectors_0.32.4           
[25] BiocGenerics_0.40.0         matrixStats_1.0.0           data.table_1.14.8           stringr_1.5.0              
[29] plyr_1.8.8                  magrittr_2.0.3              ggplot2_3.4.2               gtable_0.3.3               
[33] gtools_3.9.4                gridExtra_2.3               ArchR_1.0.3                 getopt_1.20.3              
[37] BiocManager_1.30.20        

loaded via a namespace (and not attached):
  [1] utf8_1.2.3               spatstat.explore_3.2-1   reticulate_1.30          tidyselect_1.2.0         RSQLite_2.3.1           
  [6] htmlwidgets_1.6.2        BiocParallel_1.28.3      Rtsne_0.16               devtools_2.4.5           munsell_0.5.0           
 [11] codetools_0.2-18         ica_1.0-3                future_1.32.0            miniUI_0.1.1.1           withr_2.5.0             
 [16] spatstat.random_3.1-5    colorspace_2.1-0         progressr_0.13.0         filelock_1.0.2           rstudioapi_0.14         
 [21] ROCR_1.0-11              tensor_1.5               listenv_0.9.0            GenomeInfoDbData_1.2.7   polyclip_1.10-4         
 [26] bit64_4.0.5              rprojroot_2.0.3          parallelly_1.36.0        vctrs_0.6.3              generics_0.1.3          
 [31] BiocFileCache_2.2.1      R6_2.5.1                 bitops_1.0-7             rhdf5filters_1.6.0       spatstat.utils_3.0-3    
 [36] cachem_1.0.8             DelayedArray_0.20.0      assertthat_0.2.1         promises_1.2.0.1         BiocIO_1.4.0            
 [41] scales_1.2.1             Cairo_1.6-0              globals_0.16.2           processx_3.8.1           goftest_1.2-3           
 [46] rlang_1.1.1              splines_4.1.2            rtracklayer_1.54.0       lazyeval_0.2.2           spatstat.geom_3.2-1     
 [51] broom_0.7.12             yaml_2.3.7               reshape2_1.4.4           abind_1.4-5              modelr_0.1.8            
 [56] backports_1.4.1          httpuv_1.6.11            usethis_2.2.0            tools_4.1.2              ellipsis_0.3.2          
 [61] RColorBrewer_1.1-3       sessioninfo_1.2.2        ggridges_0.5.4           progress_1.2.2           zlibbioc_1.40.0         
 [66] RCurl_1.98-1.12          ps_1.7.5                 prettyunits_1.1.1        deldir_1.0-9             pbapply_1.7-0           
 [71] urlchecker_1.0.1         cowplot_1.1.1            zoo_1.8-12               haven_2.4.3              ggrepel_0.9.3           
 [76] cluster_2.1.2            fs_1.6.2                 scattermore_1.2          lmtest_0.9-40            reprex_2.0.1            
 [81] RANN_2.6.1               fitdistrplus_1.1-11      pkgload_1.3.2            hms_1.1.3                patchwork_1.1.2         
 [86] mime_0.12                xtable_1.8-4             XML_3.99-0.14            readxl_1.3.1             compiler_4.1.2          
 [91] biomaRt_2.50.3           KernSmooth_2.23-20       crayon_1.5.2             htmltools_0.5.5          later_1.3.1             
 [96] tzdb_0.4.0               lubridate_1.8.0          DBI_1.1.3                dbplyr_2.1.1             MASS_7.3-54             
[101] rappdirs_0.3.3           cli_3.6.1                igraph_1.5.0             pkgconfig_2.0.3          GenomicAlignments_1.30.0
[106] sp_1.6-1                 plotly_4.10.2            spatstat.sparse_3.0-1    xml2_1.3.4               XVector_0.34.0          
[111] rvest_1.0.2              callr_3.7.3              digest_0.6.31            sctransform_0.3.5        RcppAnnoy_0.0.20        
[116] spatstat.data_3.0-1      Biostrings_2.62.0        cellranger_1.1.0         leiden_0.4.3             uwot_0.1.14             
[121] restfulr_0.0.15          curl_5.0.1               shiny_1.7.4              Rsamtools_2.10.0         rjson_0.2.21            
[126] nlme_3.1-153             lifecycle_1.0.3          jsonlite_1.8.5           Rhdf5lib_1.16.0          desc_1.4.2              
[131] viridisLite_0.4.2        fansi_1.0.4              pillar_1.9.0             lattice_0.20-45          pkgbuild_1.4.1          
[136] KEGGREST_1.34.0          fastmap_1.1.1            httr_1.4.6               survival_3.2-13          remotes_2.4.2           
[141] glue_1.6.2               png_0.1-8                bit_4.0.5                profvis_0.3.8            stringi_1.7.12          
[146] blob_1.2.4               memoise_2.0.1            irlba_2.3.5.1            future.apply_1.11.0