Bioconductor / BSgenome

Software infrastructure for efficient representation of full genomes and their SNPs
https://bioconductor.org/packages/BSgenome
7 stars 9 forks source link

Use unpublished in-house genome fasta for creating a BSgenome object #59

Closed alslonik closed 1 year ago

alslonik commented 1 year ago

Is it possible to Use unpublished in-house genome fasta for creating a BSgenome object ? We would like to be able to create a BSgenome object from our own yet unpublished data.

I am trying to do it with the following seed file: (Pgranatum.1.fa etc are in the /home/alex/work/genomepackage/seqs_srcdir folder)

package: BSgenome.Pgranatum.ARO Title: Full genome sequence for Punica granatum wonderful cultivar Description: Full genome sequence for Punica granatum wonderful cultivar Version: 1.0.0 organism: Punica Granatum common_name: P. granatum provider: genome provider_version: genome source_url: organism_biocview: Punica_granatum BSgenomeObjname: Pgranatum seqnames: c("Pgranatum.1","Pgranatum.2", "Pgranatum.3","Pgranatum.4", "Pgranatum.5", "Pgranatum.6", "Pgranatum.7", "Pgranatum.8", "Pgranatum.9") seqs_srcdir: /home/alex/work/genomepackage/seqs_srcdir

The error I am getting when trying to forge is:

Error in .make_Seqinfo_from_genome(genome) : 
  "genome" is not a registered NCBI assembly or UCSC genome (use registered_NCBI_assemblies() or registered_UCSC_genomes() to
  list the NCBI or UCSC assemblies/genomes currently registered in the GenomeInfoDb package)
In addition: Warning message:
In forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, destdir = destdir,  :
  field 'provider_version' is deprecated in favor of 'genome'
hpages commented 1 year ago

Hi,

For genomes not registered in GenomeInfoDb, you must specify the circ_seqs field, even if your genome does not have circular sequences (in which case circ_seqs should be set to character(0)).

Also:

Finally note that using a 2bit file for the genomic sequences is preferred over using a collection of FASTA files. Converting from FASTA to 2bit is easy to do in R: use Biostrings::readDNAStringSet() to import the FASTA file(s) as a DNAStringSet object and export that object with rtracklayer::export.2bit().

Hope this helps, H.

alslonik commented 1 year ago

It helps a lot! thanks )

hpages commented 1 year ago

Glad it helped.

Were you able to forge, install, load, and use the forged package? If so then feel free to close this issue.

Thanks

alslonik commented 1 year ago

Yes, forged, installed and used. Issue is closed, thank you very much.