Closed alslonik closed 1 year ago
Hi,
For genomes not registered in GenomeInfoDb, you must specify the circ_seqs
field, even if your genome does not have circular sequences (in which case circ_seqs
should be set to character(0)
).
Also:
provider: genome
? Why not put something a little bit more meaningful than "genome" for the name of the provider of your in-house genome? Note that you've named the package BSgenome.Pgranatum.ARO
, which means that, if you are following the naming scheme for BSgenome data packages, the 3rd part (ARO
) is the name of the provider. So why not set the provider
field to that instead?provider_version: genome
: As indicated by the warning you got, the provider_version
field is deprecated in favor of the genome
field. The value for this field should be the name of your in-house genome or assembly, so hopefully you can come up with a better name than "genome" for your in-house assembly. Think of how you're going to refer to this assembly in group meetings or when you communicate with co-workers. You're not going to refer to it as "genome" are you?PGv1
, you should embed that name in the name of the package itself e.g. BSgenome.Pgranatum.ARO.PGv1
. The naming scheme for BSgenome data packages is to use a name made of 4 parts, the 4th part being the name of the genome or assembly./home/alex/work/genomepackage/seqs_srcdir
and that these files must be named <seqname>.fa
.Finally note that using a 2bit file for the genomic sequences is preferred over using a collection of FASTA files. Converting from FASTA to 2bit is easy to do in R: use Biostrings::readDNAStringSet()
to import the FASTA file(s) as a DNAStringSet object and export that object with rtracklayer::export.2bit()
.
Hope this helps, H.
It helps a lot! thanks )
Glad it helped.
Were you able to forge, install, load, and use the forged package? If so then feel free to close this issue.
Thanks
Yes, forged, installed and used. Issue is closed, thank you very much.
Is it possible to Use unpublished in-house genome fasta for creating a BSgenome object ? We would like to be able to create a BSgenome object from our own yet unpublished data.
I am trying to do it with the following seed file: (Pgranatum.1.fa etc are in the /home/alex/work/genomepackage/seqs_srcdir folder)
package: BSgenome.Pgranatum.ARO Title: Full genome sequence for Punica granatum wonderful cultivar Description: Full genome sequence for Punica granatum wonderful cultivar Version: 1.0.0 organism: Punica Granatum common_name: P. granatum provider: genome provider_version: genome source_url: organism_biocview: Punica_granatum BSgenomeObjname: Pgranatum seqnames: c("Pgranatum.1","Pgranatum.2", "Pgranatum.3","Pgranatum.4", "Pgranatum.5", "Pgranatum.6", "Pgranatum.7", "Pgranatum.8", "Pgranatum.9") seqs_srcdir: /home/alex/work/genomepackage/seqs_srcdir
The error I am getting when trying to forge is: