ARGA-Genomes / arga-data

ARGA
Mozilla Public License 2.0
0 stars 0 forks source link

Data: NCBI genome sequences #2

Open nickdos opened 2 years ago

nickdos commented 2 years ago

https://ftp.ncbi.nih.gov/genomes/ https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/

List of refSeq sequences: https://ftp.ncbi.nih.gov/genomes/refseq/assembly_summary_refseq.txt

Example: directory for Koala (Phascolarctos cinereus)

Example Koala metadata:

ORGANISM NAME:  Phascolarctos cinereus
ORGANISM COMMON NAME:   koala
TAXID:  38626
ANNOTATION RELEASE NAME:    NCBI Phascolarctos cinereus Annotation Release 100
ANNOTATION EVIDENCE FREEZE DATE:    24-April-2017
ANNOTATION RELEASE DATE:    10-May-2017
ANNOTATION REPORT:  https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Phascolarctos_cinereus/100
ANNOTATED ASSEMBLIES:
* REFERENCE:
ASSEMBLY NAME:  phaCin_unsw_v4.1
ASSEMBLY ACCESSION: GCF_002099425.1
ASSEMBLY SUBMITTER: The Earlham Institute
ASSEMBLY DATE:  18 April 2017
ASSEMBLY TYPE:  Haploid
NUMBER OF ASSEMBLY-UNITS:   2
##Below is a 2 column list with assembly-unit id and name.
##The Primary Assembly unit is listed first.
GCF_002099505.1 Primary Assembly
GCF_000057195.1 non-nuclear

Assembly stats:

# Assembly Statistics Report
# Assembly name:  phaCin_unsw_v4.1
# Organism name:  Phascolarctos cinereus (koala)
# Isolate:  Bilbo 61053
# Sex:  female
# Taxid:          38626
# BioSample:      SAMN06198159
# BioProject:     PRJNA359763
# Submitter:      The Earlham Institute
# Date:           2017-04-18
# Assembly type:  haploid
# Release type:   major
# Assembly level: Contig
# Genome representation: full
# WGS project:    MSTS01
# Assembly method: Falcon v. 0.3.0
# Expected final version: yes
# Genome coverage: 57.3x
# Sequencing technology: PacBio
# RefSeq category: Representative Genome
# GenBank assembly accession: GCA_002099425.1
# RefSeq assembly accession: GCF_002099425.1
# RefSeq assembly and GenBank assemblies identical: no
# Reporting on RefSeq assembly.
#
## Assembly-Units:
## GenBank Unit Accession   RefSeq Unit Accession   Assembly-Unit name
## GCA_002099505.1  GCF_002099505.1 Primary Assembly
##  GCF_000057195.1 non-nuclear
#
# Statistic Types
# Statistic Description
# component-count   Number of unique components
# contig-L50    Number of contigs that are longer than, or equal to, the N50 length
# contig-N50    Contig length at which 50% of total bases in assembly are in contigs of that length or greater
# contig-count  Number of contigs
# gc-perc   gc-count/atgc-count as percentage
# molecule-count    Number of chromosomes and plasmids in full assembly
# region-count  Number of genomic regions defined in full assembly
# spanned-gaps  Number of spanned gaps. Spanned gaps are gaps within a scaffold
# top-level-count   Number of chromosomes or plasmids, unplaced/unlocalized scaffolds, alt-loci scaffolds, and patch scaffolds
# total-gap-length  Total length of gaps
# total-length  Total sequence length including bases and gaps
# ungapped-length   Total length excluding gaps
# unspanned-gaps    Number of unspanned gaps. Unspanned gaps are gaps between scaffolds
#
# Sequence-type Description
# all   statistic covers all the sequences in the unit-assembly and molecule(s) specified.
# molecule  statistic covers the specified molecule. molecule-name and molecule-type/loc will be given.
# unlocalized   statistic covers the sequences assigned to a molecule but with no position. molecule-name and molecule-type/loc will be given.
# unplaced  statistic covers the sequences not assigned to any molecule in the assembly.
#
nickdos commented 2 years ago

Fixes list: