Bioconductor / GenomeInfoDb

Utilities for manipulating chromosome names, including modifying them to follow a particular naming style
https://bioconductor.org/packages/GenomeInfoDb
30 stars 13 forks source link

Registering ARS-UI_Ramb_v2.0 Genome #100

Closed edicliuyang closed 8 months ago

edicliuyang commented 10 months ago

Hi,

Would you help to add the GCF_016772045.1_ARS-UI_Ramb_v2.0 genome to registered assemblies list?

We run into the following error when running forgeBSgenomeDataPkg. Thanks!

Error in .make_Seqinfo_from_genome(genome) : "ARS-UI_Ramb_v2" is not a registered NCBI assembly or UCSC genome (use registered_NCBI_assemblies() or registered_UCSC_genomes() to list the NCBI or UCSC assemblies/genomes currently registered in the GenomeInfoDb package)

Best, Yang

hpages commented 10 months ago

See https://github.com/Bioconductor/BSgenomeForge/issues/34#issuecomment-1718445966

edicliuyang commented 10 months ago

Hi Herve,

Thanks for the quick response! When I run the code above, I encounter this error. Is there a way to solve this problem?

forgeBSgenomeDataPkgFromNCBI("GCF_016772045.1",
pkg_maintainer="Jane Doe @.***>",
organism="ARSUIRambv2", circ_seqs=character(0))

Error in .sort_and_rename_fasta_sequences(dna, assembly_accession) : number of sequences in FASTA file does not match number of sequences in 'getChromInfoFromNCBI("GCF_016772045.1")'

Best,

Yang

On Tue, Jan 2, 2024 at 9:24 PM Hervé Pagès @.***> wrote:

See Bioconductor/BSgenomeForge#34 (comment) https://github.com/Bioconductor/BSgenomeForge/issues/34#issuecomment-1718445966

— Reply to this email directly, view it on GitHub https://github.com/Bioconductor/GenomeInfoDb/issues/100#issuecomment-1874776130, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACR74L2SR3E6COLSKNCIB6LYMS6N5AVCNFSM6AAAAABBKXHMGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUG43TMMJTGA . You are receiving this because you authored the thread.Message ID: @.***>

hpages commented 10 months ago

Oops... indeed!

The assembly report for GCF_016772045.1 has some peculiarities not seen before (NAs in the RefSeqAccn column) that break some of the sanity checks performed internally by forgeBSgenomeDataPkgFromNCBI().

This should be fixed in BSgenomeForge 1.2.1 (BioC release) and 1.3.1 (BioC devel). See https://github.com/Bioconductor/BSgenomeForge/commit/ad1a4896cff13309e05a70266c3282b03a4de0da Both versions should become available via BiocManager::install() in the next couple of days.

Also please note that:

So:

forgeBSgenomeDataPkgFromNCBI("GCF_016772045.1",
                             pkg_maintainer="Jane Doe <janedoe@gmail.com>",
                             organism="Ovis aries",
                             circ_seqs="MT")

should work (with BSgenomeForge 1.2.1 or 1.3.1) and produce the BSgenome.Oaries.NCBI.ARSUIRambv2.0 package in the current directory.

Cheers, H.

edicliuyang commented 10 months ago

Awesome, thanks for the effort and suggestion. I will update you again a couple of days later.

On Wed, Jan 3, 2024 at 3:06 AM Hervé Pagès @.***> wrote:

Oops... indeed!

The assembly report for GCF_016772045.1 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/016/772/045/GCF_016772045.1_ARS-UI_Ramb_v2.0/GCF_016772045.1_ARS-UI_Ramb_v2.0_assembly_report.txt has some peculiarities not seen before (NAs in the RefSeqAccn column) that break some of the sanity checks performed internally by forgeBSgenomeDataPkgFromNCBI().

This should be fixed in BSgenomeForge 1.2.1 (BioC release) and 1.3.1 (BioC devel). See @.*** https://github.com/Bioconductor/BSgenomeForge/commit/ad1a4896cff13309e05a70266c3282b03a4de0da Both versions should become available via BiocManager::install() in the next couple of days.

Also please note that:

  • The recommendation is to supply the binomial name of the species to the organism argurment. So "Ovis aries" in this case.
  • According to the assembly report, MT is a circular sequence.

So:

forgeBSgenomeDataPkgFromNCBI("GCF_016772045.1", pkg_maintainer="Jane Doe @.***>", organism="Ovis aries", circ_seqs="MT")

should work (with BSgenomeForge 1.2.1 or 1.3.1) and produce the BSgenome.Oaries.NCBI.ARSUIRambv2.0 package in the current directory.

Cheers, H.

— Reply to this email directly, view it on GitHub https://github.com/Bioconductor/GenomeInfoDb/issues/100#issuecomment-1874979899, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACR74L4UXPZ7FIQ5IOGK6A3YMUGPZAVCNFSM6AAAAABBKXHMGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUHE3TSOBZHE . You are receiving this because you authored the thread.Message ID: @.***>

hpages commented 8 months ago

Did this work for you with the lastest BSgenomeForge?

edicliuyang commented 8 months ago

Sorry for the delay. Yes, it works for me with the latest BSgenomeForge version.

On Wed, Feb 14, 2024 at 12:12 AM Hervé Pagès @.***> wrote:

Did this work for you with the lastest BSgenomeForge?

— Reply to this email directly, view it on GitHub https://github.com/Bioconductor/GenomeInfoDb/issues/100#issuecomment-1943093300, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACR74L4AUVJ77GRBNKG5KKTYTRBTXAVCNFSM6AAAAABBKXHMGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBTGA4TGMZQGA . You are receiving this because you authored the thread.Message ID: @.***>