Closed edicliuyang closed 8 months ago
Hi Herve,
Thanks for the quick response! When I run the code above, I encounter this error. Is there a way to solve this problem?
forgeBSgenomeDataPkgFromNCBI("GCF_016772045.1",
pkg_maintainer="Jane Doe @.***>",
organism="ARSUIRambv2",
circ_seqs=character(0))
Error in .sort_and_rename_fasta_sequences(dna, assembly_accession) : number of sequences in FASTA file does not match number of sequences in 'getChromInfoFromNCBI("GCF_016772045.1")'
Best,
Yang
On Tue, Jan 2, 2024 at 9:24 PM Hervé Pagès @.***> wrote:
See Bioconductor/BSgenomeForge#34 (comment) https://github.com/Bioconductor/BSgenomeForge/issues/34#issuecomment-1718445966
— Reply to this email directly, view it on GitHub https://github.com/Bioconductor/GenomeInfoDb/issues/100#issuecomment-1874776130, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACR74L2SR3E6COLSKNCIB6LYMS6N5AVCNFSM6AAAAABBKXHMGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUG43TMMJTGA . You are receiving this because you authored the thread.Message ID: @.***>
Oops... indeed!
The assembly report for GCF_016772045.1
has some peculiarities not seen before (NAs in the RefSeqAccn
column) that break some of the sanity checks performed internally by forgeBSgenomeDataPkgFromNCBI()
.
This should be fixed in BSgenomeForge 1.2.1 (BioC release) and 1.3.1 (BioC devel). See https://github.com/Bioconductor/BSgenomeForge/commit/ad1a4896cff13309e05a70266c3282b03a4de0da Both versions should become available via BiocManager::install()
in the next couple of days.
Also please note that:
organism
argument, the recommendation is to supply the binomial name of the species. So "Ovis aries"
in this case.MT
is a circular sequence.So:
forgeBSgenomeDataPkgFromNCBI("GCF_016772045.1",
pkg_maintainer="Jane Doe <janedoe@gmail.com>",
organism="Ovis aries",
circ_seqs="MT")
should work (with BSgenomeForge 1.2.1 or 1.3.1) and produce the BSgenome.Oaries.NCBI.ARSUIRambv2.0 package in the current directory.
Cheers, H.
Awesome, thanks for the effort and suggestion. I will update you again a couple of days later.
On Wed, Jan 3, 2024 at 3:06 AM Hervé Pagès @.***> wrote:
Oops... indeed!
The assembly report for GCF_016772045.1 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/016/772/045/GCF_016772045.1_ARS-UI_Ramb_v2.0/GCF_016772045.1_ARS-UI_Ramb_v2.0_assembly_report.txt has some peculiarities not seen before (NAs in the RefSeqAccn column) that break some of the sanity checks performed internally by forgeBSgenomeDataPkgFromNCBI().
This should be fixed in BSgenomeForge 1.2.1 (BioC release) and 1.3.1 (BioC devel). See @.*** https://github.com/Bioconductor/BSgenomeForge/commit/ad1a4896cff13309e05a70266c3282b03a4de0da Both versions should become available via BiocManager::install() in the next couple of days.
Also please note that:
- The recommendation is to supply the binomial name of the species to the organism argurment. So "Ovis aries" in this case.
- According to the assembly report, MT is a circular sequence.
So:
forgeBSgenomeDataPkgFromNCBI("GCF_016772045.1", pkg_maintainer="Jane Doe @.***>", organism="Ovis aries", circ_seqs="MT")
should work (with BSgenomeForge 1.2.1 or 1.3.1) and produce the BSgenome.Oaries.NCBI.ARSUIRambv2.0 package in the current directory.
Cheers, H.
— Reply to this email directly, view it on GitHub https://github.com/Bioconductor/GenomeInfoDb/issues/100#issuecomment-1874979899, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACR74L4UXPZ7FIQ5IOGK6A3YMUGPZAVCNFSM6AAAAABBKXHMGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUHE3TSOBZHE . You are receiving this because you authored the thread.Message ID: @.***>
Did this work for you with the lastest BSgenomeForge?
Sorry for the delay. Yes, it works for me with the latest BSgenomeForge version.
On Wed, Feb 14, 2024 at 12:12 AM Hervé Pagès @.***> wrote:
Did this work for you with the lastest BSgenomeForge?
— Reply to this email directly, view it on GitHub https://github.com/Bioconductor/GenomeInfoDb/issues/100#issuecomment-1943093300, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACR74L4AUVJ77GRBNKG5KKTYTRBTXAVCNFSM6AAAAABBKXHMGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBTGA4TGMZQGA . You are receiving this because you authored the thread.Message ID: @.***>
Hi,
Would you help to add the GCF_016772045.1_ARS-UI_Ramb_v2.0 genome to registered assemblies list?
We run into the following error when running forgeBSgenomeDataPkg. Thanks!
Error in .make_Seqinfo_from_genome(genome) : "ARS-UI_Ramb_v2" is not a registered NCBI assembly or UCSC genome (use registered_NCBI_assemblies() or registered_UCSC_genomes() to list the NCBI or UCSC assemblies/genomes currently registered in the GenomeInfoDb package)
Best, Yang