GreenleafLab / scScalpChromatin

22 stars 11 forks source link

Reproducing Figure 6 #2

Closed Goultard59 closed 1 year ago

Goultard59 commented 1 year ago

Hi,

I would like to reproduce Figure 6 of your paper with the visualisation of a motif in a particular peak.

Thanks for your helps, Adrien Dufour.

boberrey commented 1 year ago

If you want to reproduce Figure 6 with gkmSVM and gkmexplain importance scores, you will need to repeat the model training and gkmexplain scoring. There are several scripts used for this which are available in 'scScalpChromatin/GWAS/gkmSVM/'. In general you need to first generate null training sequences with 'gen_null_seqs.R', train gkmSVM models with 'gkmsvm_train.sh', generate SNP ref/alt sequences using 'gen_snp_seqs.R', use gkmexplain to generate base-resolution scores 'gkmsvm_explain_snps.sh', reformat scores for plotting with 'score_gkmexplain_snp_centered.R', and finally plot results with 'link_genes_gkmexplain.R'.

If you just want to plot sequence logos of particular motifs (will have no bearing on your actual data), you can use the seqlogo package and cisBP motifs like so:

# Plot all seqlogos
library(chromVARmotifs)
library(seqLogo)
library(ggseqlogo)
library(ggplot2)
library(dplyr)
library(tidyr)

# Get all cisBP motifs
data("human_pwms_v2")
cisBP_motifs <- human_pwms_v2

plotDir <- "/my/output/directory"
cisBP_dir <- paste0(plotDir, "/cisBP_human")
dir.create(cisBP_dir, showWarnings = FALSE, recursive = TRUE)

revcomp <- c(A="T",G="C",C="G",T="A")

# cisBP SeqLogos:
for(i in seq_along(cisBP_motifs)){
  mt <- cisBP_motifs[[i]]
  motifName <- mt@name
  message(sprintf("Plotting motif %s, %s of %s", motifName, i, length(cisBP_motifs)))
  # To convert PWMs back to position frequency matrices, do the reverse of what was described here:
  # https://github.com/GreenleafLab/chromVARmotifs
  pwm <- makePWM(exp(mt@profileMatrix)*0.25)@pwm
  width <- ncol(pwm)*0.5 + 2
  pdf(paste0(cisBP_dir, sprintf("/%s_seqLogo.pdf", motifName)), width=width, height=4)

  # Forward
  p <- (
    ggseqlogo(pwm, seq_type="dna", method='bits')
    + ggtitle(motifName)
  )
  print(p)

  # Reverse complement
  rcpwm <- pwm[,c(ncol(pwm):1)]
  rownames(rcpwm) <- revcomp[rownames(rcpwm)]
  p <- (
    ggseqlogo(rcpwm[rownames(pwm),], seq_type="dna", method='bits')
    + ggtitle(paste0(motifName, "-rev"))
  )
  print(p)
  dev.off()
}