ash-bell / viral_phylogenetics

Making viral phylogeny trees using HMMs to search for viral hallmark genes
1 stars 0 forks source link

Suggestions #1

Open btemperton opened 4 years ago

btemperton commented 4 years ago

Looks great! I'd suggest a couple of improvements:

  1. I would recommend you build trees using amino acids, rather than nucleotides unless you know you are comparing very similar phages.
  2. You could use MCL to define clusters of proteins to generate your HMMs, in case your genome is poorly annotated.
  3. Your muscle and trimal alignments are doing different things. trimal with the automated1 flag will remove the low information gaps in the alignment. Muscle doesn't do that by default, so the output alignments will be quite different. I'd recommend sticking with the trimal version.
ash-bell commented 4 years ago
  1. I agree with the proteins over nucleotides, however, hmmalign by will try and find a match with every contig. Therefore, having whole genome sequences and letting it pull out a match per phage seems to be the easier approach. Perhaps concatenating all the protein translations into a single contig per gene may work ...
  2. I like that idea, but hmmalign does a good job of finding related genes. If MCL drastically improves gene finding then it may be useful
  3. trimal by default doesn't convert contigs into MSA, therefore MUSCLE/MAFFT is needed. However, trimal doesn't remove gaps which may help HMM construction and save space. Will implement a MUSCLE --> trimal -automated1 instead.