Bioconductor / BSgenome

Software infrastructure for efficient representation of full genomes and their SNPs
https://bioconductor.org/packages/BSgenome
9 stars 8 forks source link

Errors creating BSgenome for rn7 #18

Closed zrcjessica closed 3 years ago

zrcjessica commented 3 years ago

Hello,

I need to create a new BSgenome package for rn7 and I have created the following seed file:

Package: BSgenome.Rnorvegicus.UCSC.rn7
Title: Full genome sequences for Rattus norvegicus (UCSC version rn7)
Description: Full genome sequences for Rattus norvegicus (Rat) as provided by UCSC (rn7, Nov. 2020)
Version: 1.0.0
organism: Rattus norvegicus
common_name: Rat
genome: rn7
provider: UCSC
release_date: Nov. 2020
source_url: https://hgdownload.soe.ucsc.edu/goldenPath/rn7/bigZips/
organism_biocview: Rattus_norvegicus
BSgenomeObjname: Rnorvegicus
SrcDataFiles: rn7.2bit from http://hgdownload.cse.ucsc.edu/goldenPath/rn7/bigZips/
seqs_srcdir: /iblm/netapp/data1/external/UCSC/rn7
seqfile_name: rn7.2bit

However, I get the following error:

Error in .make_Seqinfo_from_genome(genome) : 
  "rn7" is not a registered NCBI assembly or UCSC genome (use
  registered_NCBI_assemblies() or registered_UCSC_genomes() to list the
  NCBI or UCSC assemblies/genomes currently registered in the
  GenomeInfoDb package)

I saw someone else report this in #16 , and someone said they fixed it by adding circ_seqs: character(0) when using a 2bit file. However, when I add circ_seqs: character(0) to my seed file I get this error:

Creating package in ./BSgenome.Rnorvegicus.UCSC.rn7 
Copying '/iblm/netapp/data1/external/UCSC/rn7/rn7.2bit' to './BSgenome.Rnorvegicus.UCSC.rn7/inst/extdata/single_sequences.2bit' ... Error in .copyTwobitFile(seqfile_name, seqs_srcdir, seqs_destdir, verbose = verbose) : 
  FAILED
In addition: Warning message:
In file.create(to[okay]) :
  cannot create file './BSgenome.Rnorvegicus.UCSC.rn7/inst/extdata/single_sequences.2bit', reason 'No such file or directory'

Can anyone help me resolve this issue? Thanks!

hpages commented 3 years ago

Hi @zrcjessica ,

I've registered the rn7 genome in GenomeInfoDb 1.28.2 (BioC 3.13) and 1.29.4 (BioC 3.14). These new versions will become available via BiocManager::install() in the next 48h.

Note that I'm also in the process of forging and adding BSgenome.Rnorvegicus.UCSC.rn7 to Bioconductor. I'll report here when this is ready.

Best, H.

hpages commented 3 years ago

BSgenome.Rnorvegicus.UCSC.rn7 is now in Bioconductor. Will become available via BiocManager::install() in the next 3h.

LiNk-NY commented 3 years ago

Hi Hervé,

It would be good to have / localize the script that you used to generate the package so that community members (myself included) can learn from you and successfully build the package next time.

Thanks so much!

Best, Marcel

hpages commented 3 years ago

No script. Just preparing a seed file for rn7 like @zrcjessica did (I included the new seed file in the BSgenome package in inst/extdata/GentlemanLab/) and running forgeBSgenomeDataPkg() on it. The process is documented in the BSgenomeForge vignette.

But first rn7 needed to be registered in GenomeInfoDb which is why it was not working for @zrcjessica. So maybe your question Marcel is how to register a new genome in GenomeInfoDb?

keroguynes commented 1 year ago

Dear @hpages,

I have the exact same issue. I did successfully create a BSgenome in 2022 for three different species that are not registered in NCBI or UCSC. I have updated the seed.dcf file to exclude deprecated fields but I am met with the same error of not having a registered assembly. I am not sure what to exactly enter in the genome field.

Following are the link to the NCBI assemblies if it is at all useful: https://www.ncbi.nlm.nih.gov/assembly/GCA_903813345.2/ https://www.ncbi.nlm.nih.gov/assembly/GCA_000328365.1/ https://www.ncbi.nlm.nih.gov/assembly/GCA_904063045.1/

Many thanks in advance for your help!

hpages commented 1 year ago

@keroguynes Please open an new issue and share more details e.g. show your seed file, the output produced by forgeBSgenomeDataPkg(), and your sessionInfo(). Thanks!