YuLab-SMU / ggmsa

:traffic_light: Visualizing publication-quality multiple sequence alignment using ggplot2
http://yulab-smu.top/ggmsa
200 stars 22 forks source link

Add overlay for Domains and other annotations #67

Open docmanny opened 5 months ago

docmanny commented 5 months ago

Hello!

I've been enjoying using ggmsa for my visualizations, however, it would be nice to have a function for plotting domains and other functional annotations on top of MSA plots.

Currently, the closest I can get to this is using a combination of aplot and ggmsa:

library(tidyverse)
library(ggmsa)
library(aplot)

protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")

domains <- tribble(
  ~label, ~start, ~end, ~domain,
  "PH4H_Homo_sapiens", 36, 144, "ACT"
)
(
p.msa <- ggmsa(protein_sequences, start = 1, end = 280, char_width = 0.25, ref = "PH4H_Homo_sapiens", seq_name = TRUE)
)
(
  p.domains <- domains %>% 
    ggplot(
      aes(
        xmin=start,
        xmax=end,
        ymin = label %>% as.factor %>% as.numeric %>% magrittr::subtract(0.5),
        ymax=label %>% as.factor %>% as.numeric %>% magrittr::add(0.5),
        fill=domain,
        label=domain
      )
    ) + 
    geom_segment(
      aes(
        x=1,
        xend=280,
        y=as.factor(label),
        yend=as.factor(label)
      )
    ) +
    geom_rect() +
    geom_text(
      aes(
        x=(start + end)/2,
        y=as.factor(label)
      )
    ) + 
    scale_x_continuous(breaks = c(36,144), expand = c(0,0)) + 
    scale_y_discrete(expand = c(0,0)) + 
    theme_minimal() +
    theme(
      axis.line.y = element_blank(),
      axis.text.y = element_text(face='bold'),
      axis.title.y = element_blank(),
      axis.ticks.y = element_blank(),
      axis.title.x = element_blank(),
      plot.margin = unit(c(0,0,0,0), "mm"),
      legend.position = "none"
    )
)

p.domains %>% insert_bottom(p.msa, height = 9)

image

Using PH4H is a simple example here because there is no gaps in the reference sequence for the domain, however, it would be ideal if the function also automatically extended domains to accomodate gaps introduced to the reference in the MSA.