Closed PhyloGrok closed 6 years ago
Generate stat summary comparing RefSeq vs. non-Refseq - "Primates (24)", "Rodents (20)", "Insects (92)", "Other Invertebrates (27)" (https://www.ncbi.nlm.nih.gov/genome/annotation_euk/all/)
Could parse from the "Genome Reports" files :
ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/
Data for the following taxa:
User summary of genome assembly stats:
Reads out a list of "Primates" with Assembly db data from the 'Meta' block of the assembly file
(columns = Organism, SpeciesTaxid, GbUid, AssemblyAccession, RefSeq_category, AssemblyStatus, contig_count, contig_l50, contig_n50)
-Formatting output as .csv with column headers, use R make a graphical output of refseq vs non-refseq assemblies (ie. ANOVA of contig parameters between taxa levels.. R/ggplot?)