elizabethmcd / metabolisHMM

Tool for constructing phylogenies and summarizing metabolic characteristics based on curated and custom profile HMMs
GNU General Public License v3.0
17 stars 5 forks source link

Presence/absence among a given set of genomes #12

Closed elizabethmcd closed 4 years ago

elizabethmcd commented 5 years ago

I currently have where you can make a gene tree based off an HMM, and it gives that tree only if the marker is present in those genomes, which is nice for looking at distribution/evolution of that marker. I also do presence/absence of a suite of metabolic markers. I like Mike Lee's example of showing presence/absence with a highlighted tree of all genomes, so you can broadly see where the given marker is NOT located.

Steps:

  1. Can pass an assembly accession file to ncbi-genome-download. Can also get metadata from here
  2. Calling ORFs/annotations because just pulling down nucleotide genbank files
  3. Search for the marker across proteins
  4. Make ribosomal protein tree of all genomes in the set
  5. Highlight with color the presence/absence, and also give an output of # among that clade (such as phyla)
elizabethmcd commented 4 years ago
elizabethmcd commented 4 years ago

Have to add prodigal as a dependency, and fix the header names both with .fna and with given .faa because I was assuming that before.