Closed rcedgar closed 3 years ago
trees are here: https://serratus-public.s3.amazonaws.com/pb/results/otus_plus_toro.tar.gz
rooted versions have the .rooted
postfix
note that the above is the result of 100 searches + 1000 bootstraps. If we decide to take this, I can still scale it up. This one took approx. 2:40h
For the tree search can we include All Toro (I would opt for all Nido) and clip down to 3 leaves post-hoc. We don't know which Toro sequences ahead of time are the closest to CoV/Eps for inclusion.
There are four Toro refseqs. I checked identities to Cov and they cluster close to each other and far from Cov. For this round, I think including three as the smallest non-trivial outgroup is right approach. For next round after more discovery, I agree we should do a deeper dive into Nido and check outgroups more carefully. This is a fair amount of additional work to get the PFAM alignments, review how many non-Refseqs we need to include to get good diversity, and so on. SNW IMHO.
Data is organized into OTUs defined by clustering RdRp a.a. sequences.
Fig. 1 uses two OTU thresholds: 99% (~sub-strain) and 97% (~strain).
Central to the figure is a radial cladogram tree, something like this:
https://drive5.com/tmp/pol.svg
The tree will be constructed by @Pbdas using Cov OTUs (GB+Serratus) plus three Toro OTUs as an outgroup.
Each leaf on the tree is one 97% OTU.
Segments of the tree are colored according to:
Novel segments discovered by Serratus (e.g. Epsiloncoronavirus) are visually distinguished.
Exterior to the tree are Circos-like rings.
Ring 1. Previously known virus-host associations. Ring 2. Virus-host associations added by Serratus.
Hosts classified by order (Primate, Rodent...). There will be ~10 orders. 10 is too many colors for a key, will have to use additional visual features such as cross-hatch.
Ring 3. Diversity of each 97% OTU, measured as the number of 99% OTUs it contains, divided into three categories: 3a. OTUs in GenBank only. 3b. OTUs in GenBank and Serratus. 3c. New discoveries, i.e. Serratus only. Visualization of these numbers TBD, they may be small enough to have one dot per 99% OTU.