Closed Guan06 closed 1 year ago
Hi @Guan06 ,
For an unregistered genome, you must specify the seqnames
field in your seed file.
If the genome is registered in GenomeInfoDb, then you don't need to specify the seqnames
field because, in this case, forgeBSgenomeDataPkg()
will be able to fetch the sequence names for you. So the process of forging a BSgenome data package is just more convenient when the genome is registered in GenomeInfoDb, but it's not a requirement.
Hope this helps, H.
Thank you Herve very much for your reply!
I added the filed @seqnames but the error message is still there.. below is the non-standard DESCRIPTION fields of the seed file that I made according to 2.2.3 in vignette:
organism: Bacteroides uniformis ATCC8492 common_name: Bacteroides uniformis genome: Bacteroides uniformis provider: NCBI release_date: 2022/09/12 source_url: https://www.ncbi.nlm.nih.gov/nuccore/CP102263.1 organism_biocview: Bacteroides_uniformis_ATCC8492 BSgenomeObjname: Buniformis_ATCC8492 seqnames: c("Buni8492") circ_seqs: character(0) seqs_srcdir: /rds-d6/user/rg684/hpc-work/bin/Buniformis_BSgenome/ seqfile_name: ATCC8492.2bit ondisk_seq_format: rda
Inside the folder /rds-d6/user/rg684/hpc-work/bin/Buniformis_BSgenome/ I have the following files: ATCC8492.2bit Buni8492.fa
Thanks again and best, Rui
I added the filed @seqnames but the error message is still there..
What error message? You never showed it.
I see that you have the following line in your seed file:
ondisk_seq_format: rda
I'm not quite sure what you're trying to achieve with this, but, since you have the genomic sequences in a 2bit file (ATCC8492.2bit
), you should not need to specify ondisk_seq_format
.
H.
I added the filed @seqnames but the error message is still there..
What error message? You never showed it.
I see that you have the following line in your seed file:
ondisk_seq_format: rda
I'm not quite sure what you're trying to achieve with this, but, since you have the genomic sequences in a 2bit file (
ATCC8492.2bit
), you should not need to specifyondisk_seq_format
.H.
Oh sorry.. it is the error saying that 'genome' is not included in registered_NCBI_assemblies() or registered_UCSC_genomes(); but after removing the 'ondisk_seq_format' field and rerun everything, the error message was gone.
Thank you again for your help!
Best, Rui
Hi,
I am trying to forge a data package for the bacteria I am working with (Bacteroides uniformis), however, when preparing seed file, in the field "genome", non of the close genome of the species could be find in either registered_NCBI_assemblies() nor registered_UCSC_genomes(). Any suggestion what could be done to bypass this error?
Many thanks and looking forward to your suggestions!
Best, Rui