PoisonAlien / trackplot

Generate IGV style locus tracks from bigWig files in R
133 stars 16 forks source link
bigwigs igv-like locus ucsc-browser

trackplot - Fast and easy visualisation of bigWig files in R

GitHub closed issues

Introduction

trackplot.R is an ultra-fast, simple, and minimal dependency R script to generate IGV style track plots (aka locus plots), profile plots and heatmaps from bigWig files.

Installation

trackplot.R is a standalone R script and requires no installation. Just source it and you're good to go!

source("https://github.com/PoisonAlien/trackplot/blob/master/R/trackplot.R?raw=true")

# OR

download.file(url = "https://raw.githubusercontent.com/PoisonAlien/trackplot/master/R/trackplot.R", destfile = "trackplot.R")
source('trackplot.R') 

# OR If you prefer to have it as package

remotes::install_github(repo = "poisonalien/trackplot")

Features

Why trackplot?

Dependencies

  1. data.table R package - which itself has no dependency.
  2. bwtool - a command line tool for processing bigWig files. Install and move the binary to a PATH (e.g; /usr/local/bin) or a directory under the PATH.

Usage

Simple usage - Make a table of all the bigWig files to be analysed with read_coldata() and pass it to the downstream functions.

flowchart TD
    a[bigWig file list] -->A{read_coldata}
    A --> B{track_extract}
    B --> B1[track_plot]
    A --> C{profile_extract}
    C --> C1[profile_summarize]
    C --> C3[profile_heatmap]
    C1 --> C2[profile_plot]
    A --> D{extract_summary}
    D --> D1[pca_plot]
    D --> D2[diffpeak]
    D2 --> D3[volcano_plot]
#Path to bigWig files
bigWigs = c("H1_Oct4.bw", "H1_Nanog.bw", "H1_k4me3.bw", 
            "H1_k4me1.bw", "H1_k27ac.bw", "H1_H2az.bw", "H1_Ctcf.bw")

#Make a table of bigWigs along with ref genome build
bigWigs = read_coldata(bws = bigWigs, build = "hg19")

trackplots

track_extract() and track_plot() are two functions to generate IGV style track plots (aka locus plots) from bigWig files. Additionally, track_summarize can summarize tracks by condition.

Step-1: Extract signal from bigWig files

#Region to plot
oct4_loci = "chr6:31125776-31144789"

#Extract bigWig signal for a loci of interest
t = track_extract(colData = bigWigs, loci = oct4_loci)

#Or you can also specifiy a gene name instead of a loci 
# - loci and gene models will be automatically extracted from UCSC genome browser
t = track_extract(colData = bigWigs, gene = "POU5F1")

Step-2: Plot

Basic plot

track_plot(summary_list = t)

Add cytoband and change colors for each track

track_cols = c("#d35400","#d35400","#2980b9","#2980b9","#2980b9", "#27ae60","#27ae60")
track_plot(summary_list = t, 
          col = track_cols, 
          show_ideogram = TRUE)

Heighilight regions of interest (any bed files would do)

oct4_nanog_peaks = c("H1_Nanog.bed","H1_Oct4.bed") #Peak files 
track_plot(summary_list = t, 
          col = track_cols, 
          show_ideogram = TRUE, 
          peaks = oct4_nanog_peaks)

Add some chromHMM tracks to the bottom

chromHMM data should be a bed file with the 4th column containing chromatin state. See here for an example file.

Note that the color code for each of the 15 states are as described here. In case if it is different for your data, you will have to define your own color codes for each state and pass it to the argument chromHMM_cols

chromHMM_peaks = "H1_chromHMM.bed"

track_plot(summary_list = t, 
          col = track_cols, 
          show_ideogram = TRUE, 
          peaks = oct4_nanog_peaks, chromHMM = chromHMM_peaks)

Add some chromHMM tracks from UCSC

UCSC has 9 cell lines for which chromHMM data is available. These can be added automatically in case if you dont have your own data. In this case, use the argument ucscChromHMM with any values from TableName column of the below table.

                    TableName    cell                      Description              Tissue Karyotype
1: wgEncodeBroadHmmGm12878HMM GM12878     B-lymphocyte, lymphoblastoid               blood    normal
2:  wgEncodeBroadHmmH1hescHMM H1-hESC             embryonic stem cells embryonic stem cell    normal
3:   wgEncodeBroadHmmHepg2HMM   HepG2         hepatocellular carcinoma               liver    cancer
4:   wgEncodeBroadHmmHepg2HMM    HMEC         mammary epithelial cells              breast    normal
5:    wgEncodeBroadHmmHsmmHMM    HSMM        skeletal muscle myoblasts              muscle    normal
6:   wgEncodeBroadHmmHuvecHMM   HUVEC umbilical vein endothelial cells        blood vessel    normal
track_plot(summary_list = t, 
          col = track_cols, 
          show_ideogram = TRUE, 
          peaks = oct4_nanog_peaks, 
          ucscChromHMM = c("wgEncodeBroadHmmH1hescHMM", "wgEncodeBroadHmmNhlfHMM"))

Re-organize tracks

By default tracks are organized from top to bottom as c("p", "b", "h", "g", "c") corresponding to peaks track, bigWig track, chromHmm track, gene track, and cytoband track. This can be changes with the argument layout_ord. Furthermore, bigWig tracks themselves can be ordered with the argument bw_ord which accepts the names of the bigWig tracks as input and plots them in the given order.

#Draw only NANOG, OCT4 bigWigs in that order. Re-organize the layout in the order chromHMM track, gene track, cytoband track. Rest go to the end.
track_plot(
  summary_list = t,
  col = track_cols,
  show_ideogram = TRUE,
  genename = c("POU5F1", "TCF19"),
  peaks = oct4_nanog_peaks,
  peaks_track_names = c("NANOG", "OCT4"),
  groupAutoScale = FALSE, ucscChromHMM = "wgEncodeBroadHmmH1hescHMM", y_min = 0, 
  bw_ord = c("NANOG", "OCT4"), 
  layout_ord = c("h", "g", "c")
)

narrowPeaks and broadPeaks

All of the above plots can also be generated with narrowPeak or broadPeak files as input. Here, 5th column containing scores are plotted as intensity. Color coding and binning of scores are as per UCSC convention

narrowPeak is one of the output from macs2 peak caller and are easier to visualize in the absence of bigWig files.

narrowPeaks = c("H1_Ctcf.bed", "H1_H2az.bed", "H1_k27ac.bed", 
                "H1_k4me1.bed", "H1_k4me3.bed", "H1_Nanog.bed", 
                "H1_Oct4.bed", "H1_Pol2.bed")

#Use peak as input_type
narrowPeaks = read_coldata(narrowPeaks, build = "hg19", input_type = "peak")

oct4_loci = "chr6:30,818,383-31,452,182" #633Kb region for example

narrowPeaks_track = track_extract(colData = narrowPeaks, loci = oct4_loci)

#Rest plotting is same
track_plot(summary_list = narrowPeaks_track, 
          show_ideogram = TRUE, 
          peaks = oct4_nanog_peaks, 
          ucscChromHMM = c("wgEncodeBroadHmmH1hescHMM", "wgEncodeBroadHmmNhlfHMM"))

image

profileplots

profile_extract() -> profile_summarize() -> profile_plot() are functions to generate density based profile-plots from bigWig files.

Example data from GSE99183 where U87 glioma cell lines are treated with a DMSO and a BRD4 degradaer.

bws = c("GSM2634756_U87_BRD4.bw", "GSM2634757_U87_BRD4_dBET_24h.bw", "GSM2634758_U87_BRD4_dBET_2h.bw")
bws = read_coldata(bws = bws, 
                  sample_names = c("BRD4", "BRD4_dBET_24h", "BRD4_dBET_2h"), 
                  build = "hg19")

Refseq transcripts

#Extract signals from bigWig files around refseq transcripts
pe_refseq = profile_extract(colData = bws, ucsc_assembly = TRUE, 
                            startFrom = 'start', up = 1500, down = 1500)

#Estimate mean signal
ps_refseq = profile_summarize(sig_list = pe_refseq) 

#Plot
profile_plot(ps_refseq)

Custom BED regions

#BRD4 binding sites 
bed = "GSM2634756_U87_BRD4_peaks.narrowPeak.gz"

#Center and extend 1500 both ways from the peak center
pe_bed = profile_extract(colData = bws, bed = bed, startFrom = "center", 
                          up = 1500, down = 1500, nthreads = 4)

#Estimate mean signal
ps_bed = profile_summarize(sig_list = pe_bed) 

#Plot
profile_plot(ps_bed)

heatmap

Output from profile_extract can be used to draw a heatmap with profile_heatmap

profile_heatmap(mat_list = pe_bed, top_profile = TRUE, zmaxs = 0.8)

PSA If you find the tool useful, consider starring this repository or up voting this Biostars thread so that more poeple can find it :)

Caveat

Citation

If you find the script useful consider citing trackplot and bwtool

Mayakonda, Anand, and Frank Westermann. Trackplot: a fast and lightweight R script for epigenomic enrichment plots. Bioinformatics advances vol. 4,1 vbae031. 28 Feb. 2024. PMID: 38476298

Pohl A, Beato M. bwtool: a tool for bigWig files. Bioinformatics. 2014 Jun 1;30(11):1618-9. doi: 10.1093/bioinformatics/btu056. Epub 2014 Jan 30. PMID: 24489365

Acknowledgements

Joschka Hey for all the cool suggestions :)