Cluster specific mutations

One approach to finishing this functionality would be to use BioPython's "dumb" consensus function inside a single for loop through the different cluster ids. When we read sequences in from the alignment, we can keep them as SeqRecord instances so the algorithm looks like:

Read sequences as SeqRecords into mapping of strain name to SeqRecord
Read mapping of strain name to cluster id
For cluster id in cluster ids
1. Create list of SeqRecords from strains in the cluster
2. Create MultipleSeqAlignment from records list
3. Create dumb consensus from MultipleSeqAlignment
4. Write dumb consensus (named by cluster id) to open consensus FASTA file handle

We also want to parameterize the cluster id from the metadata using a --group-by argument to the consensus script, so we can pass in "MCC", "clade_membership", "mds_label", etc. The update proposed to the MERS Snakefile in this PR shows an example of how we want to parameterize the Snakemake rule for consensus sequences by embedding method, so we can get cluster-specific mutations per method. The final consensus table will need to include a column for the embedding method along with the pathogen, position, and mutation information that it already includes.

blab / cartography

Cluster specific mutations #25