Closed marwa38 closed 11 months ago
There are some resources in AnnotationHub
You can search for the
> query(ah, c("salmo", "salar"))
AnnotationHub with 82 records
# snapshotDate(): 2023-06-23
# $dataprovider: Ensembl, FANTOM5,DLRP,IUPHAR,HPRD,STRING,SWISSPROT,TREMBL,E...
# $species: salmo salar, Salmo salar
# $rdataclass: GRanges, TwoBitFile, EnsDb, SQLiteFile, OrgDb
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH78891"]]'
title
AH78891 | Ensembl 99 EnsDb for Salmo salar
AH79444 | Salmo_salar.ICSASG_v2.99.abinitio.gtf
AH79445 | Salmo_salar.ICSASG_v2.99.chr.gtf
AH79446 | Salmo_salar.ICSASG_v2.99.gtf
AH79796 | Ensembl 100 EnsDb for Salmo salar
... ...
AH111196 | Salmo_salar.Ssal_v3.1.109.abinitio.gtf
AH111197 | Salmo_salar.Ssal_v3.1.109.chr.gtf
AH111198 | Salmo_salar.Ssal_v3.1.109.gtf
AH111452 | LRBaseDb for Salmo salar (Atlantic salmon, v005)
AH111638 | org.Salmo_salar.eg.sqlite
and more
> query(ah, c("salmon", "atlantic"))
AnnotationHub with 5 records
# snapshotDate(): 2023-06-23
# $dataprovider: FANTOM5,DLRP,IUPHAR,HPRD,STRING,SWISSPROT,TREMBL,ENSEMBL,CE...
# $species: Salmo salar
# $rdataclass: SQLiteFile
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH91764"]]'
title
AH91764 | LRBaseDb for Salmo salar (Atlantic salmon, v001)
AH97832 | LRBaseDb for Salmo salar (Atlantic salmon, v002)
AH100540 | LRBaseDb for Salmo salar (Atlantic salmon, v003)
AH107261 | LRBaseDb for Salmo salar (Atlantic salmon, v004)
AH111452 | LRBaseDb for Salmo salar (Atlantic salmon, v005)
@marwa38
FWIW I added the Salmo_salar
term to the biocViews vocabulary: https://github.com/Bioconductor/biocViews/blob/cbf0ec7d111b5f244e51ff2a95b48068b6e86ed8/inst/dot/biocViewsVocab.dot#L256. Note that the Salmo_salar
view won't show up here until at least one package adds the Salmo_salar
term to its biocViews
field.
I also registered a few Salmo salar NCBI assemblies in the GenomeInfoDb package:
library(GenomeInfoDb)
registered_NCBI_assemblies("salmo")[ , c(1:3, 5)]
# organism assembly date assembly_accession
# 1 Salmo salar Ssal_v3.1 2021/04/21 GCF_905237065.1
# 2 Salmo salar USDA_NASsal_1.1 2022/01/12 GCA_021399835.1
# 3 Salmo salar Ssal_Brian_v1.0 2022/04/01 GCA_923944775.1
# 4 Salmo salar Ssal_ALTA 2022/05/11 GCA_931346935.2
This allows you to easily retrieve chromosome/scaffolds names and attributes for a given assembly:
ssal_chrom_info <- getChromInfoFromNCBI("Ssal_v3.1")
dim(ssal_chrom_info)
# [1] 4011 10
ssal_chrom_info[1:10, c(1:2, 8, 10)]
# SequenceName SequenceRole SequenceLength circular
# 1 ssa01 assembled-molecule 174498729 FALSE
# 2 ssa02 assembled-molecule 95481959 FALSE
# 3 ssa03 assembled-molecule 105780080 FALSE
# 4 ssa04 assembled-molecule 90536438 FALSE
# 5 ssa05 assembled-molecule 92788608 FALSE
# 6 ssa06 assembled-molecule 96060288 FALSE
# 7 ssa07 assembled-molecule 68862998 FALSE
# 8 ssa08 assembled-molecule 28860523 FALSE
# 9 ssa09 assembled-molecule 161282225 FALSE
# 10 ssa10 assembled-molecule 125877811 FALSE
Finally it also makes it super easy to forge a BSgenome package for a given assembly, using the BSgenomeForge package:
library(BSgenomeForge)
forgeBSgenomeDataPkgFromNCBI(assembly_accession="GCF_905237065.1",
pkg_maintainer="Jane Doe <janedoe@gmail.com>")
Let us know if that addresses your issue so we can close. Thanks!
Thanks so much :)
Could you please add Atlantic salmon as an annotation file under organism in biocondutor? https://bioconductor.org/packages/release/BiocViews.html#___Organism Many thanks in advance