Closed jang1563 closed 3 months ago
This issue should go away if you specify the seqnames
and circ_seqs
fields in your seed file, so make sure both are specified.
Were you able to sort this out @jang1563?
@hpages I still have error due to missing 'genome' filed.
Here is my seed file.
'mySeedfile' Package: BSgenome.Scerevisiae.BY4743.2nd Title: Full genome sequence for Saccharomyces Cerevisiae BY4743 Description: Full genome sequence for Saccharomyces Cerevisiae BY4743 Version: 1.0.0 BSgenomeObjname: Scerevisiae seqnames: c("scaffold_1", "scaffold_2", "scaffold_3", "scaffold_4", "scaffold_5", "scaffold_6", "scaffold_7", "scaffold_8", "scaffold_9", "scaffold_10", "scaffold_11", "scaffold_12", "scaffold_13", "scaffold_14", "scaffold_15", "scaffold_16", "scaffold_17", "scaffold_18", "scaffold_19", "scaffold_20", "scaffold_21", "scaffold_22", "scaffold_23", "scaffold_24", "scaffold_25") circ_seqs: character(0) seqs_srcdir: /athena/masonlab/scratch/users/jak4013/Artemis/Artemis_I/Fasta/seqs_srcdir/split
With this seed file, I tried to run the following code.
forgeBSgenomeDataPkg("mySeedfile")
However, I encountered this error. -> Error in forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, destdir = destdir, : 'genome' field is missing in seed file
Do you have any suggestions?
The error message is pretty clear in this case: it tells you that the genome
field is missing in your seed file. So my suggestion is that you add it :wink:
Please refer to the How to forge a BSgenome data package vignette in the BSgenome package for more information.
Best
The manual says the genome files is non-necessary but it failed due to the missing genome. This point is confusing to me. As I mentioned in the 1st question, this reference is custom so there is no matching genome in NCBI or UCSC.
The manual says the genome files is non-necessary
Not sure what manual you are looking at but that is not what it says. Here is the vignette you want to consult: https://bioconductor.org/packages/release/bioc/vignettes/BSgenome/inst/doc/BSgenomeForge.pdf
In the vignette, fields that are not necessary are marked with a big upper case OPTIONAL e.g.:
PkgDetail
: [OPTIONAL] Some arbitrary text that will be copied to the Details
section of the man page of the target package.For genome
we have:
genome
: The name of the genome. Typically the name of an NCBI assembly (e.g. GRCh38.p12
, WBcel235
, TAIR10.1
, ARS-UCD1.2
, etc... or UCSC genome (e.g. hg38
, bosTau9
, galGal6
, ce11
, etc... Should preferably match part 4 of the package name (field Package
). For the packages built by the Bioconductor project from a UCSC genome, this field corresponds to the UCSC VERSION
field of the List of UCSC genome releases table.So yes, the genome
field is required.
This point is confusing to me. As I mentioned in the 1st question, this reference is custom so there is no matching genome in NCBI or UCSC.
Just give your genome a name (note that the name should not contain spaces or special characters other than .
). Who says it has to match a genome in NCBI or UCSC? "Typically" in "Typically the name of an NCBI assembly or UCSC genome" doesn't mean "it must be".
I solved this issue with this seed file.
In this case, it worked well without 'genome' field.
Package: BSgenome.Scerevisiae.BY4743.04.11.2024.ver3 Title: Full genome sequence for Saccharomyces Cerevisiae BY4743 Description: Full genome sequence for Saccharomyces Cerevisiae BY4743 Version: 1.0.0 organism: Yeast common_name: Yeast organism_biocview: Yeast provider: JK provider_version: ONT release_date: April, 2024 release_name: Scerevisiae.BY4743 source_url: NA BSgenomeObjname: Scerevisiae seqnames: c('1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25') circ_seqs: character(0) seqs_srcdir: /athena/masonlab/scratch/users/jak4013/Artemis/Artemis_I/Fasta/seqs_srcdir/split/chr
Hi,
I'm trying to forge with Saccharomyces Cerevisiae BY4743 downloaded from here (https://www.atcc.org/products/201390). The issues is that this strain is not registered in either NCBI or UCSC.
So I encountered this error. Error in .make_Seqinfo_from_genome(genome): "Scer-BY4743" is not a registered NCBI assembly or UCSC genome (use registered_NCBI_assemblies() or registered_UCSC_genomes() to list the NCBI or UCSC assemblies/genomes currently registered in the GenomeInfoDb package)
Is there any suggestion to solve this issue?