SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.
GNU General Public License v3.0
90 stars 29 forks source link

PIRATE_plots.pdf created by plot_summary.R #73

Closed haruosuz closed 2 years ago

haruosuz commented 2 years ago

If I understand correctly, the figure on Page 9 in PIRATE_plots.pdf created by scripts/plot_summary.R shows that Shared gene presence per isolate ordered alongside the tree generated by fasttree from binary gene_family presence-absence data. In the heatmap ("Pangenome cluster presence/absence"), Gene family presence is indicated by a color block per column, and Gene family absence is indicated by a white block per column. I was wondering what the different colors (low="firebrick", high="darkblue") indicate?

References:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6785682/

Figure 4. (D) Figure 5. (D) Shared gene presence per isolate ordered alongside the phylogenetic tree. Gene family presence is indicated by a blue block per column.

https://github.com/SionBayliss/PIRATE

https://github.com/SionBayliss/PIRATE/blob/master/scripts/plot_summary.R

# [optional] tree based plots
tree_file <- sprintf("%s/binary_presence_absence.nwk", input_root)

*snip*

  # make phandango plot
  phandango_allele <- gheatmap(tree.plot, hpos_plot, low="firebrick", high="darkblue", colnames = F, 
           offset = mx_raw*1, width = 6, color=NA) +
    theme(legend.position = "bottom") +
    ggtitle("Pangenome cluster presence/absence") +
    theme(plot.title = element_text(face = "bold", hjust = 0.5), 
          legend.key.width = unit(1.2, "cm") )
SionBayliss commented 2 years ago

I believe that it represents the percentage identity threshold at which the sequences clustered in that gene family (firebrick = low, darkblue = high). The actual values of which are set by -s (e.g. 90%, 95%, 98%, low=90, high = 98).