MagpiePKU / EpiTrace

Cell age determination by scATAC-seq and bulk-ATAC-seq
https://epitrace.readthedocs.io
GNU General Public License v3.0
26 stars 3 forks source link

EpiTraceAge_Convergence, Error in download.file #3

Open moontreegy opened 6 months ago

moontreegy commented 6 months ago

Hi,

I encountered a problem when running EpiTraceAge_Convergence in the step 4 of the tutorial. It seems to be caused by some functions attempting to download chromosome information. Given that our server cannot connect to the network, is it possible to pre-fetch the seqinfo object? Could you advise on which code I should modify to load the object?

Thank you in advance.

Below is the traceback:

please make double sure your ref genome, peak set and cells are similar.

Preparing obj...

ref clock list is not standard. Please make sure the input data, peak set and clock set are in similar reference genome.

Input peakset is set to be hg19

Joining with `by = join_by(Clock_panel)`
**Error in download.file(url, destfile, quiet = TRUE): cannot open URL 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz'**
Traceback:

1. EpiTraceAge_Convergence(peakSet = init_gr, matrix = init_mm, 
 .     ref_genome = "mm10", clock_gr = mouse_clock_by_MM285, iterative_time = 5, 
 .     min.cutoff = 0, non_standard_clock = T, qualnum = 10, ncore_lim = 48, 
 .     mean_error_limit = 0.1)
2. EpiTrace_prepare_object(initial_peakSet_clk, initial_matrix_clk, 
 .     celltype, ref_genome = "hg19", non_standard_clock = T, clock_gr_list = iterative_GR_list, 
 .     sep_string = sep_string, fn.k.param = fn.k.param, lsi_dim = lsi_dim, 
 .     qualnum = qualnum, min.cutoff = min.cutoff, run_reduction = F, 
 .     remove_peaks_number = remove_peaks_number)
3. Signac::CreateChromatinAssay(matrix, sep = sep_string, genome = ref_genome, 
 .     ranges = peakSet)
4. as.ChromatinAssay(x = seurat.assay, ranges = ranges, seqinfo = genome, 
 .     motifs = motifs, fragments = frags, annotation = annotation, 
 .     bias = bias, positionEnrichment = positionEnrichment)
5. as.ChromatinAssay.Assay(x = seurat.assay, ranges = ranges, seqinfo = genome, 
 .     motifs = motifs, fragments = frags, annotation = annotation, 
 .     bias = bias, positionEnrichment = positionEnrichment)
6. SetAssayData(object = new.assay, slot = "seqinfo", new.data = seqinfo)
7. SetAssayData.ChromatinAssay(object = new.assay, slot = "seqinfo", 
 .     new.data = seqinfo)
8. Seqinfo(genome = new.data)
9. .make_Seqinfo_from_genome(genome)
10. getChromInfoFromUCSC(genome, as.Seqinfo = TRUE)
11. .get_chrom_info_for_registered_UCSC_genome(script_path, assembled.molecules.only = assembled.molecules.only, 
  .     map.NCBI = map.NCBI, add.ensembl.col = add.ensembl.col, goldenPath.url = goldenPath.url, 
  .     recache = recache)
12. .get_raw_chrom_info_for_registered_UCSC_genome(GENOME, ASSEMBLED_MOLECULES, 
  .     vars$CIRC_SEQS, FETCH_ORDERED_CHROM_SIZES = vars$FETCH_ORDERED_CHROM_SIZES, 
  .     assembled.molecules.only = assembled.molecules.only, goldenPath.url = goldenPath.url, 
  .     recache = recache)
13. .fetch_raw_chrom_info_from_UCSC(GENOME, ASSEMBLED_MOLECULES, 
  .     CIRC_SEQS, FETCH_ORDERED_CHROM_SIZES, goldenPath.url = goldenPath.url)
14. FETCH_ORDERED_CHROM_SIZES(goldenPath.url = goldenPath.url)
15. GenomeInfoDb:::fetch_chrom_sizes_from_UCSC(GENOME, goldenPath.url = goldenPath.url)
16. .fetch_chrom_sizes_from_UCSC_database(genome, goldenPath.url = goldenPath.url)
17. fetch_table_from_UCSC_database(genome, "chromInfo", col2class = col2class, 
  .     goldenPath.url = goldenPath.url)
18. fetch_table_from_url(url, colnames = names(col2class), col2class = col2class)
19. suppressWarnings(download.file(url, destfile, quiet = TRUE))
20. withCallingHandlers(expr, warning = function(w) if (inherits(w, 
  .     classes)) tryInvokeRestart("muffleWarning"))
21. download.file(url, destfile, quiet = TRUE)
MagpiePKU commented 6 months ago

As a quick fix please try to install and call a BSgenome.UCSC.hg19 package first:

The package could be fetched on https://bioconductor.org/packages/release/data/annotation/html/BSgenome.Hsapiens.UCSC.hg19.html . You can download it and install with R CMD BSgenome.Hsapiens.UCSC.hg19_1.4.3.tar.gz etc. Then use library(BSgenome.Hsapiens.UCSC.hg19) before running the EpiTrace_Convergence code.

We would plan to fix this in the next cycle.

moontreegy commented 6 months ago

As a quick fix please try to install and call a BSgenome.UCSC.hg19 package first:

The package could be fetched on https://bioconductor.org/packages/release/data/annotation/html/BSgenome.Hsapiens.UCSC.hg19.html . You can download it and install with R CMD BSgenome.Hsapiens.UCSC.hg19_1.4.3.tar.gz etc. Then use library(BSgenome.Hsapiens.UCSC.hg19) before running the EpiTrace_Convergence code.

We would plan to fix this in the next cycle.

I used library(BSgenome.Hsapiens.UCSC.hg19) and library(BSgenome.Mmusculus.UCSC.mm10), but it still shows the same error.

Below is my env.

R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] parallel  grid      stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] ChIPseeker_1.34.1                 rhdf5_2.42.0                     
 [3] SummarizedExperiment_1.28.0       Biobase_2.58.0                   
 [5] MatrixGenerics_1.10.0             Rcpp_1.0.11                      
 [7] matrixStats_1.2.0                 data.table_1.14.8                
 [9] stringr_1.5.1                     plyr_1.8.9                       
[11] magrittr_2.0.3                    gtable_0.3.4                     
[13] gtools_3.9.5                      gridExtra_2.3                    
[15] ArchR_1.0.2                       patchwork_1.1.3                  
[17] ggtree_3.6.0                      SeuratObject_4.1.3               
[19] Seurat_4.3.0                      EpiTrace_0.0.1.3                 
[21] Matrix_1.5-3                      ggplot2_3.4.4                    
[23] openxlsx_4.2.5.2                  reshape2_1.4.4                   
[25] readr_2.1.4                       tidyr_1.3.0                      
[27] dplyr_1.1.4                       BSgenome.Hsapiens.UCSC.hg19_1.4.3
[29] BSgenome_1.66.3                   rtracklayer_1.58.0               
[31] Biostrings_2.66.0                 XVector_0.38.0                   
[33] GenomicRanges_1.50.2              GenomeInfoDb_1.34.9              
[35] IRanges_2.32.0                    S4Vectors_0.36.2                 
[37] BiocGenerics_0.44.0              

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3                         
  [2] pbdZMQ_0.3-9                           
  [3] scattermore_0.8                        
  [4] easyLift_0.2.1                         
  [5] bit64_4.0.5                            
  [6] knitr_1.42                             
  [7] irlba_2.3.5.1                          
  [8] DelayedArray_0.24.0                    
  [9] rpart_4.1.19                           
 [10] KEGGREST_1.38.0                        
 [11] RCurl_1.98-1.13                        
 [12] doParallel_1.0.17                      
 [13] generics_0.1.3                         
 [14] GenomicFeatures_1.50.4                 
 [15] preprocessCore_1.60.2                  
 [16] cowplot_1.1.1                          
 [17] RSQLite_2.3.4                          
 [18] shadowtext_0.1.2                       
 [19] RANN_2.6.1                             
 [20] future_1.32.0                          
 [21] enrichplot_1.18.0                      
 [22] bit_4.0.5                              
 [23] tzdb_0.4.0                             
 [24] xml2_1.3.3                             
 [25] spatstat.data_3.0-1                    
 [26] httpuv_1.6.9                           
 [27] viridis_0.6.2                          
 [28] xfun_0.38                              
 [29] hms_1.1.3                              
 [30] evaluate_0.20                          
 [31] promises_1.2.0.1                       
 [32] fansi_1.0.6                            
 [33] restfulr_0.0.15                        
 [34] progress_1.2.3                         
 [35] caTools_1.18.2                         
 [36] dbplyr_2.3.2                           
 [37] igraph_1.4.2                           
 [38] DBI_1.2.0                              
 [39] htmlwidgets_1.6.2                      
 [40] spatstat.geom_3.1-0                    
 [41] purrr_1.0.2                            
 [42] ellipsis_0.3.2                         
 [43] backports_1.4.1                        
 [44] biomaRt_2.54.1                         
 [45] deldir_1.0-6                           
 [46] vctrs_0.6.5                            
 [47] ROCR_1.0-11                            
 [48] abind_1.4-5                            
 [49] cachem_1.0.8                           
 [50] withr_2.5.2                            
 [51] ggforce_0.4.1                          
 [52] HDO.db_0.99.1                          
 [53] progressr_0.13.0                       
 [54] vroom_1.6.5                            
 [55] checkmate_2.1.0                        
 [56] sctransform_0.3.5                      
 [57] GenomicAlignments_1.34.1               
 [58] treeio_1.22.0                          
 [59] prettyunits_1.2.0                      
 [60] goftest_1.2-3                          
 [61] DOSE_3.24.0                            
 [62] cluster_2.1.4                          
 [63] ape_5.7                                
 [64] IRdisplay_1.1                          
 [65] lazyeval_0.2.2                         
 [66] crayon_1.5.2                           
 [67] spatstat.explore_3.1-0                 
 [68] pkgconfig_2.0.3                        
 [69] tweenr_2.0.2                           
 [70] nlme_3.1-162                           
 [71] nnet_7.3-18                            
 [72] rlang_1.1.2                            
 [73] globals_0.16.2                         
 [74] lifecycle_1.0.4                        
 [75] miniUI_0.1.1.1                         
 [76] filelock_1.0.2                         
 [77] BiocFileCache_2.6.0                    
 [78] polyclip_1.10-4                        
 [79] lmtest_0.9-40                          
 [80] aplot_0.1.10                           
 [81] IRkernel_1.3.2                         
 [82] boot_1.3-28.1                          
 [83] Rhdf5lib_1.20.0                        
 [84] zoo_1.8-12                             
 [85] base64enc_0.1-3                        
 [86] ggridges_0.5.4                         
 [87] png_0.1-8                              
 [88] viridisLite_0.4.2                      
 [89] rjson_0.2.21                           
 [90] bitops_1.0-7                           
 [91] KernSmooth_2.23-20                     
 [92] rhdf5filters_1.10.0                    
 [93] blob_1.2.4                             
 [94] qvalue_2.30.0                          
 [95] parallelly_1.35.0                      
 [96] spatstat.random_3.1-4                  
 [97] gridGraphics_0.5-1                     
 [98] scales_1.3.0                           
 [99] memoise_2.0.1                          
[100] ica_1.0-3                              
[101] gplots_3.1.3                           
[102] zlibbioc_1.44.0                        
[103] scatterpie_0.1.8                       
[104] compiler_4.2.2                         
[105] BiocIO_1.8.0                           
[106] RColorBrewer_1.1-3                     
[107] plotrix_3.8-2                          
[108] fitdistrplus_1.1-8                     
[109] Rsamtools_2.14.0                       
[110] cli_3.6.2                              
[111] listenv_0.9.0                          
[112] pbapply_1.7-0                          
[113] htmlTable_2.4.1                        
[114] Formula_1.2-5                          
[115] MASS_7.3-58.2                          
[116] WGCNA_1.72-1                           
[117] tidyselect_1.2.0                       
[118] stringi_1.8.3                          
[119] GOSemSim_2.24.0                        
[120] yaml_2.3.7                             
[121] ggrepel_0.9.3                          
[122] fastmatch_1.1-3                        
[123] tools_4.2.2                            
[124] future.apply_1.10.0                    
[125] rstudioapi_0.14                        
[126] uuid_1.1-0                             
[127] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[128] foreach_1.5.2                          
[129] foreign_0.8-84                         
[130] farver_2.1.1                           
[131] plyranges_1.18.0                       
[132] Rtsne_0.16                             
[133] ggraph_2.1.0                           
[134] digest_0.6.33                          
[135] shiny_1.7.4                            
[136] later_1.3.0                            
[137] RcppAnnoy_0.0.20                       
[138] httr_1.4.7                             
[139] AnnotationDbi_1.60.2                   
[140] colorspace_2.1-0                       
[141] XML_3.99-0.14                          
[142] fs_1.6.3                               
[143] tensor_1.5                             
[144] reticulate_1.28                        
[145] splines_4.2.2                          
[146] uwot_0.1.14                            
[147] yulab.utils_0.1.2                      
[148] RcppRoll_0.3.0                         
[149] tidytree_0.4.2                         
[150] spatstat.utils_3.0-2                   
[151] graphlayouts_0.8.4                     
[152] sp_1.6-0                               
[153] ggplotify_0.1.2                        
[154] plotly_4.10.1                          
[155] xtable_1.8-4                           
[156] jsonlite_1.8.8                         
[157] tidygraph_1.2.3                        
[158] dynamicTreeCut_1.63-1                  
[159] ggfun_0.0.9                            
[160] R6_2.5.1                               
[161] Hmisc_5.0-1                            
[162] pillar_1.9.0                           
[163] htmltools_0.5.5                        
[164] mime_0.12                              
[165] nnls_1.4                               
[166] glue_1.6.2                             
[167] fastmap_1.1.1                          
[168] BiocParallel_1.32.6                    
[169] codetools_0.2-19                       
[170] fgsea_1.24.0                           
[171] Signac_1.9.0                           
[172] utf8_1.2.4                             
[173] lattice_0.20-45                        
[174] spatstat.sparse_3.0-1                  
[175] tibble_3.2.1                           
[176] curl_5.2.0                             
[177] leiden_0.4.3                           
[178] zip_2.2.2                              
[179] GO.db_3.16.0                           
[180] survival_3.5-3                         
[181] rmarkdown_2.21                         
[182] repr_1.1.6                             
[183] munsell_0.5.0                          
[184] fastcluster_1.2.3                      
[185] GenomeInfoDbData_1.2.9                 
[186] iterators_1.0.14                       
[187] impute_1.72.0  
MagpiePKU commented 6 months ago

Initial ideas (untested since we have the connection here...) could be:

  1. Remove lines 109-112
  2. In line 663, change ref_genome = "hg19" into ref_genome = ref_genome
  3. Follow https://github.com/broadinstitute/ichorCNA/issues/84#issuecomment-1432623732 example to get a mm10 seqInfo RDS
  4. read that RDS as an object such as seqinfo
  5. Use seqinfo in EpiTrace as EpiTraceAge_Convergence(peakSet = init_gr, matrix = init_mm, ref_genome = seqinfo , clock_gr = mouse_clock_by_MM285, iterative_time = 5, min.cutoff = 0, non_standard_clock = T, qualnum = 10, ncore_lim = 48, mean_error_limit = 0.1)

See if this can handle the problem? It seems to be a known issue in GenomeInfoDb without internet connection such as in : https://github.com/stuart-lab/signac/issues/249.

moontreegy commented 6 months ago

Hi,

I modified the script EpiTrace.R, generated a mm10 seqInfo using the below code, and replaced the parameters with ref_genome = seqinfo

genomeBuild = "mm10"
genomeStyle = "UCSC"
library(GenomeInfoDb)
bsg <- paste0("BSgenome.Mmusculus.UCSC.", genomeBuild)
  if (!require(bsg, character.only=TRUE, quietly=TRUE, warn.conflicts=FALSE)) {
    seqinfo <- Seqinfo(genome=genomeBuild)
  } else {
    seqinfo <- seqinfo(get(bsg))
  }
seqlevelsStyle(seqinfo) <- genomeStyle

seqinfo <- keepSeqlevels(seqinfo, value = paste0("chr",c(1:19,"X")))

saveRDS(seqinfo, file = "/data/work/2024-05-15_EpiTrace/seqinfo_mm10_ucsc.rds")

but it shows another error

Error in match(x, table, nomatch = 0L): 'match' requires vector arguments
Traceback:

1. EpiTraceAge_Convergence(peakSet = init_gr, matrix = init_mm, 
 .     ref_genome = seqinfo, clock_gr = mouse_clock_by_MM285, iterative_time = 5, 
 .     min.cutoff = 0, non_standard_clock = T, qualnum = 10, ncore_lim = 48, 
 .     mean_error_limit = 0.1)
2. ref_genome %in% "hg38"
3. ref_genome %in% "hg38"

Could you please assist in testing it without an internet connection?

Many thanks

ellenketter commented 2 months ago

As a quick fix please try to install and call a BSgenome.UCSC.hg19 package first:

The package could be fetched on https://bioconductor.org/packages/release/data/annotation/html/BSgenome.Hsapiens.UCSC.hg19.html . You can download it and install with R CMD BSgenome.Hsapiens.UCSC.hg19_1.4.3.tar.gz etc. Then use library(BSgenome.Hsapiens.UCSC.hg19) before running the EpiTrace_Convergence code.

We would plan to fix this in the next cycle.

This also did not resolve the problem for me, which was replicated in the simple bulk ATAC example when running without an internet connection.