Open cdshaffer opened 5 years ago
best approach would probably be to keep track of results of clusters and subclusters as the BLAST searches are preformed for rough assignment of genes to phams. ideally set up some kind of counter on subcluster/cluster of top blastp hits. Then at end of all the BLAST searches try to pick a cluster
actively working on in the Schema10 branch. Currently the code keeps track of all clusters represented in each pham for each gene in the phage. It then counts up this representation and calculates the % of genes in the new phage genome that have at least some members in a given cluster. It is assumed that at least 90% of the genes will have other genes from the same cluster and that there are more of this cluster than other clusters represented by about 15%. The code is currently in report.py.
by adding changes in color based on switches in subcluster the resulting tracks show a switch in color from the phage loaded as a profile and subsequent phages. This is because the profile phage does not have an assigned subcluster to it is in essence subcluster=None. If we want the color of a profile to match the color of phage from the same subcluster would need to add code to assign subcluster for unphamerated phages. This would be tricky as there is no pre-computed blast database of whole phage sequences and hence to easy way to assign subcluster.