Open btemperton opened 4 years ago
hmmalign
by will try and find a match with every contig. Therefore, having whole genome sequences and letting it pull out a match per phage seems to be the easier approach. Perhaps concatenating all the protein translations into a single contig per gene may work ... hmmalign
does a good job of finding related genes. If MCL drastically improves gene finding then it may be usefultrimal
by default doesn't convert contigs into MSA, therefore MUSCLE/MAFFT
is needed. However, trimal
doesn't remove gaps which may help HMM construction and save space. Will implement a MUSCLE
--> trimal
-automated1 instead.
Looks great! I'd suggest a couple of improvements:
muscle
andtrimal
alignments are doing different things.trimal
with theautomated1
flag will remove the low information gaps in the alignment. Muscle doesn't do that by default, so the output alignments will be quite different. I'd recommend sticking with thetrimal
version.