Bioconductor / BSgenome

Software infrastructure for efficient representation of full genomes and their SNPs
https://bioconductor.org/packages/BSgenome
7 stars 9 forks source link

forgeBSgenomeDataPkg: not a registered NCBI assembly or UCSC genome error #62

Closed keroguynes closed 1 year ago

keroguynes commented 1 year ago

Dear @hpages,

I am getting the following error: Error in .make_Seqinfo_from_genome(genome) : when I try to run forgeBSgenomeDataPkg.

I'm attaching the seed.dcf file and the R session info below. This is just for one of the three species for which I'm creating a BSgenome.

Package: OfusTxdb
Title: Full genome sequence for Owenia fusiformis N6347
Description: Full genome sequence for all scaffolds of Owenia fusiformis N6347 provided by NCBI
Suggests: GenomicFeatures
organism: Owenia fusiformis
common_name: Owenia fusiformis
genome: 1.0.0
provider: NCBI
release_date: 2022/03
BSgenomeObjname: ofus
source_url: https://www.ncbi.nlm.nih.gov/assembly/GCA_903813345.2/
organism_biocview: AnnotationData, BSgenome
SrcDataFiles: Owenia_unmasked_v082020.2bit
seqfile_name: Owenia_unmasked_v082020.2bit
seqfiles_suffix: .2bit
seqs_srcdir: /Users/chema/Desktop/ofus/seqs_srcdir
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.5.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome_1.64.0        rtracklayer_1.56.1     Biostrings_2.64.1      XVector_0.36.0         GenomicFeatures_1.48.4
 [6] AnnotationDbi_1.58.0   Biobase_2.56.0         GenomicRanges_1.48.0   GenomeInfoDb_1.32.4    IRanges_2.30.1        
[11] S4Vectors_0.34.0       BiocGenerics_0.42.0   

loaded via a namespace (and not attached):
  [1] colorspace_2.0-3            bsseq_1.32.0                rjson_0.2.21                ellipsis_0.3.2             
  [5] circlize_0.4.15             GlobalOptions_0.1.2         fs_1.5.2                    clue_0.3-63                
  [9] rstudioapi_0.14             remotes_2.4.2               bit64_4.0.5                 fansi_1.0.3                
 [13] xml2_1.3.3                  codetools_0.2-18            R.methodsS3_1.8.2           sparseMatrixStats_1.8.0    
 [17] doParallel_1.0.17           cachem_1.0.6                pkgload_1.3.2               Rsamtools_2.12.0           
 [21] dbplyr_2.2.1                cluster_2.1.4               png_0.1-8                   R.oo_1.25.0                
 [25] shiny_1.7.4                 HDF5Array_1.24.2            BiocManager_1.30.19         compiler_4.2.2             
 [29] httr_1.4.4                  assertthat_0.2.1            Matrix_1.5-3                fastmap_1.1.0              
 [33] limma_3.52.4                cli_3.6.0                   later_1.3.0                 htmltools_0.5.4            
 [37] prettyunits_1.1.1           tools_4.2.2                 gtable_0.3.1                glue_1.6.2                 
 [41] GenomeInfoDbData_1.2.8      dplyr_1.0.10                rappdirs_0.3.3              Rcpp_1.0.9                 
 [45] vctrs_0.5.1                 rhdf5filters_1.8.0          iterators_1.0.14            DelayedMatrixStats_1.18.2  
 [49] stringr_1.5.0               ps_1.7.2                    mime_0.12                   miniUI_0.1.1.1             
 [53] lifecycle_1.0.3             restfulr_0.0.15             gtools_3.9.4                devtools_2.4.5             
 [57] XML_3.99-0.13               zlibbioc_1.42.0             scales_1.2.1                hms_1.1.2                  
 [61] promises_1.2.0.1            MatrixGenerics_1.8.1        parallel_4.2.2              SummarizedExperiment_1.26.1
 [65] rhdf5_2.40.0                RColorBrewer_1.1-3          curl_4.3.3                  ComplexHeatmap_2.12.1      
 [69] yaml_2.3.6                  memoise_2.0.1               ggplot2_3.4.0               biomaRt_2.52.0             
 [73] stringi_1.7.8               RSQLite_2.2.20              BiocIO_1.6.0                foreach_1.5.2              
 [77] permute_0.9-7               filelock_1.0.2              pkgbuild_1.4.0              BiocParallel_1.30.4        
 [81] shape_1.4.6                 rlang_1.0.6                 pkgconfig_2.0.3             matrixStats_0.63.0         
 [85] bitops_1.0-7                lattice_0.20-45             purrr_1.0.0                 Rhdf5lib_1.18.2            
 [89] GenomicAlignments_1.32.1    htmlwidgets_1.6.1           bit_4.0.5                   processx_3.8.0             
 [93] tidyselect_1.2.0            plyr_1.8.8                  magrittr_2.0.3              R6_2.5.1                   
 [97] generics_0.1.3              profvis_0.3.7               DelayedArray_0.22.0         DBI_1.1.3                  
[101] pillar_1.8.1                KEGGREST_1.36.3             RCurl_1.98-1.9              tibble_3.1.8               
[105] crayon_1.5.2                utf8_1.2.2                  BiocFileCache_2.4.0         urlchecker_1.0.1           
[109] progress_1.2.2              GetoptLong_1.0.5            usethis_2.1.6               locfit_1.5-9.7             
[113] grid_4.2.2                  data.table_1.14.6           blob_1.2.3                  callr_3.7.3                
[117] digest_0.6.31               xtable_1.8-4                httpuv_1.6.7                R.utils_2.12.2             
[121] munsell_0.5.0               sessioninfo_1.2.2 

Thank you for your help in advance!

hpages commented 1 year ago

You're not showing the error message that you got!

Note that for assemblies that are not registered in the GenomeInfoDb package, you must provide the seqnames and circ_seqs fields, e.g.:

seqnames: getChromInfoFromNCBI("GCA_903813345.2")$SequenceName
circ_seqs: character(0)

Also please take the time to carefully read the "How to forge a BSgenome data package" vignette from the BSgenome package. In particular the section about how to "Prepare the BSgenome data package seed file" will help you realize that:

Hope this helps, H.

hpages commented 1 year ago

@keroguynes Did this help? Were you able to forge the package?

keroguynes commented 1 year ago

Dear @hpages,

I was able to forge the package. For anyone else who may be interested, I added the following information for it to work:

Package: **species**
Title: Full genome sequence for **species**
Description: Full genome sequence for all scaffolds of **species** provided by NCBI
Version: 1.0.0
organism: **species**
common_name: **species**
genome: 1.0.0
provider: NCBI
release_date: year/month
BSgenomeObjname: **species**
source_url: NCBI
organism_biocview: AnnotationData, BSgenome
SrcDataFiles: species.2bit
seqfile_name: species.2bit
seqs_srcdir: ~/seqs_srcdir
circ_seqs: character(0)

Thank you for your help. I will close this query now.