For each batch processed an annotated tree must be constructed for each gene. Currently an R script handles the alignment, tree construction and plotting.
multifasta with headers such as XXXX.4 XXXX.6
Output:
pdf of annotated tree(s). references colored red, samples colored blue. Extra info such as how the trees were generated (methods), date of generation would be useful.
tsv of Closest Prototypic Virus (CPV) - headers are seqno, result where Result is the CPV
The subcommand handles the reference fasta (detected from the lineage). Must include fasta datasets in the package for each subtype and each gene.
Notes:
Reuse the alignment functions from align_frames.py
Given a list of known CPV per lineage, first calculate the distance of all samples to known CPV to generate a matrix. The closest ancestor per sample is the CPV. For tie breakers, a priority list is needed, probably the oldest vaccine strain or CPV is used.
CPV.tsv with the headers cpv, gene, priority where priority is per gene. Use previous plots to create priority list.
Tasks:
[ ] Parse fasta file and combine with reference set
[ ] Parse or get from fuzee meta files
[ ] Get low reactors from fuzee
[ ] Rename sequences
[ ] Plot trees using toyplot
[ ] Calculate CPV using .get_distance method from ete3 package
For each batch processed an annotated tree must be constructed for each gene. Currently an R script handles the alignment, tree construction and plotting.
Example usage:
Input:
seqno, result
whereResult
is the CPVThe subcommand handles the reference fasta (detected from the lineage). Must include fasta datasets in the package for each subtype and each gene.
Notes:
Questions:
cpv, gene, priority
where priority is per gene. Use previous plots to create priority list.Tasks:
toyplot
.get_distance
method from ete3 package