Bioconductor / BSgenome

Software infrastructure for efficient representation of full genomes and their SNPs
https://bioconductor.org/packages/BSgenome
9 stars 8 forks source link

Can I forge an unregistered NCBI? (Am I doing this right?) #27

Closed stevenjblair closed 2 years ago

stevenjblair commented 2 years ago

EDIT: I noticed this morning in the closed issues that this has already been asked and all I need is to put seqnames into the seedfile. Now I have another error! haha, but this one I think is just a syntax error on my part.

Hello, From reading on biostars, github, other sources, I think I understand now that the assembly needs to be registered. I have also read that it is best to convert the fasta to 2bit.

2bit is not an option for me as the genome I am working with is too big. I need to find a way around 2bit and I hope that you can help me register the genome. All of the resources do appear to be available. For instance I get data when I run getChromInfoFromNCBI("GCA_002915635.3") so I am hopeful that with a little more help I can make this work.

The assembly is for Ambystoma mexicanum https://www.ncbi.nlm.nih.gov/assembly/GCA_002915635.3

If it is possible to forge assemblies that are not found in registered_NCBI_assemblies(), then perhaps I could have someone look at my seed?

The seed is: Package: BSgenome.Amexicanum.NCBI.ambMex60DD Title: Ambystoma Mexicanum (Axolotl) full genome (Schloissnig version V6.0-DD) Description: A chromosome-scale assembly of the axolotl genome as provided by Schloissnig (v6.0-DD, April. 2021) Version: 1.0.0 organism: Ambystoma mexicanum common_name: axolotl genome: GCA_002915635.3 provider: Schloissnig release_date: April, 2021 source_url: https://www.ncbi.nlm.nih.gov/assembly/GCA_002915635.3 organism_biocview: Ambystoma_mexicanum seqs_srcdir: $SCRATCH/GCA_002915635.3

The error is: Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'seqlevels': "GCA_002915635.3" is not a registered -NCBI assembly or UCSC genome (use registered_NCBI_assemblies() or registered_UCSC_genomes() to list the NCBI or UCSC assemblies/genomes currently registered in the GenomeInfoDb package)

Any help would be great, thank you.

Edit to add: Can I use multiple 2bit files to forge a BSgenome?