Bioconductor / BSgenome

Software infrastructure for efficient representation of full genomes and their SNPs
forgeBSgenomeDataPkg: not a registered NCBI assembly or UCSC genome error #62

keroguynes commented 1 year ago

Dear @hpages,

I am getting the following error: Error in .make_Seqinfo_from_genome(genome) : when I try to run forgeBSgenomeDataPkg.

I'm attaching the seed.dcf file and the R session info below. This is just for one of the three species for which I'm creating a BSgenome.

Package: OfusTxdb
Title: Full genome sequence for Owenia fusiformis N6347
Description: Full genome sequence for all scaffolds of Owenia fusiformis N6347 provided by NCBI
Suggests: GenomicFeatures
organism: Owenia fusiformis
common_name: Owenia fusiformis
genome: 1.0.0
provider: NCBI
release_date: 2022/03
BSgenomeObjname: ofus
organism_biocview: AnnotationData, BSgenome
SrcDataFiles: Owenia_unmasked_v082020.2bit
seqfile_name: Owenia_unmasked_v082020.2bit
seqfiles_suffix: .2bit
seqs_srcdir: /Users/chema/Desktop/ofus/seqs_srcdir
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.5.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome_1.64.0        rtracklayer_1.56.1     Biostrings_2.64.1      XVector_0.36.0         GenomicFeatures_1.48.4
 [6] AnnotationDbi_1.58.0   Biobase_2.56.0         GenomicRanges_1.48.0   GenomeInfoDb_1.32.4    IRanges_2.30.1        
[11] S4Vectors_0.34.0       BiocGenerics_0.42.0   

Thank you for your help in advance!

hpages commented 1 year ago

You're not showing the error message that you got!

Note that for assemblies that are not registered in the GenomeInfoDb package, you must provide the seqnames and circ_seqs fields, e.g.:

seqnames: getChromInfoFromNCBI("GCA_903813345.2")$SequenceName
circ_seqs: character(0)

Also please take the time to carefully read the "How to forge a BSgenome data package" vignette from the BSgenome package. In particular the section about how to "Prepare the BSgenome data package seed file" will help you realize that:

Hope this helps, H.

hpages commented 1 year ago

@keroguynes Did this help? Were you able to forge the package?

keroguynes commented 1 year ago

Dear @hpages,

I was able to forge the package. For anyone else who may be interested, I added the following information for it to work:

Package: **species**
Title: Full genome sequence for **species**
Description: Full genome sequence for all scaffolds of **species** provided by NCBI
Version: 1.0.0
organism: **species**
common_name: **species**
genome: 1.0.0
provider: NCBI
release_date: year/month
BSgenomeObjname: **species**
source_url: NCBI
organism_biocview: AnnotationData, BSgenome
SrcDataFiles: species.2bit
seqfile_name: species.2bit
seqs_srcdir: ~/seqs_srcdir
circ_seqs: character(0)

Thank you for your help. I will close this query now.