YuLab-SMU / ChIPseeker

:dart: ChIP peak Annotation, Comparison and Visualization
https://onlinelibrary.wiley.com/share/author/GYJGUBYCTRMYJFN2JFZZ?target=10.1002/cpz1.585
220 stars 74 forks source link

Add function of ploting gene body #154

Closed MingLi-929 closed 3 years ago

MingLi-929 commented 3 years ago

This pull request is to add the function of plot gene body. The basic idea is to scaled the genes having different lengths to the equal length, the idea was derived from the function of deeptools. (https://deeptools.readthedocs.io/en/develop/content/tools/computeMatrix.html)

The way to reach the goal of scaled genes is to devide gene in the same amounts of boxes. Because the amount of boxes is equal, the gene can be thought of scaling to equal length. So three parameters( scaledlength , binsize, box) is set to reach this goal.

The first is scaledlength.the parameter represents the length of different genes are scaled to. For example, if we want to scale the TSS region(-3k,3k) and we set scaledlength=8000. Then the 6k bp region will be scaled to 8k bp.

The second parameters is binsize. The parameter represents the amount of the amount of nucleotide base in each box.

The third parameters is box. The parameter represents the amount of boxes.

Here is an example to illustrate the function.

scaledlength = 8000
binsize = 50
box = 8000/50 = 160

gene1: 
genebodylength = 3200, box =160, so each box has 3200/160= 20 bp. 
This means 20 bp is extended to 50 bp(binsize)

gene2: 
genebodylength = 9600, box =160, so each box has 9600/160= 60 bp. 
This means 60 bp is cut to 50 bp(binsize)

In this way, all the genes having different lengths are scaled to the same size.

The usage and testing of the function lay below.

  1. compare the plotAvgProf() and plotGeneBody()
    
    devtools::install_github("MingLi-929/ChIPseeker")

library(ChIPseeker) library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

files <- getSampleFiles() peak <- files[[4]]

promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)

genebody <- getGeneBody(TxDb = txdb, type = "gene")

use plotAvgProf() to draw the TSS region

tagmatrix <- getTagMatrix(peak, weightCol = "V5", windows = promoter) plotAvgProf(tagmatrix, xlim = c(-3000,3000),conf = 0.95)

![plotavg](https://user-images.githubusercontent.com/78794151/128011035-93c6de84-0f0b-4009-bece-a02bc33e13c5.png)

use plotGeneBody to draw the TSS region

bodymatrix <- getGenebodyMatrix(peak, weightCol = "V5", windows = promoter, scaledlength = 8000, binsize = 50, min_body_length = 1000) plotGeneBody(bodymatrix,conf = 0.95, scaledlength = 8000,binsize = 50)

![plotgenebody](https://user-images.githubusercontent.com/78794151/128011116-59fe1cc7-6b0e-466b-83b1-d8c1ffa1b842.png)

2. compare `plotAvgProf2()` and `plotGeneBody2`

use plotAvgProf2 to draw the TSS region

plotAvgProf2(peak, TxDb=txdb, upstream=3000, downstream=3000, conf = 0.95)

![plotavg2](https://user-images.githubusercontent.com/78794151/128011353-b1d65aed-0146-4480-82fa-05f8ce3f87c2.png)

use plotGeneBody2 to draw the TSS region

plotGeneBody2(peak, type = "promoters", conf = 0.95, TxDb = txdb)

![plotgenebody2](https://user-images.githubusercontent.com/78794151/128011410-14be6445-741d-4710-a6b3-84b646d00c75.png)

3. compare `plotAvgProf2()` and `plotGeneBody2` to draw a list of peaks 

use plotGeneBody2 to draw a list of the TSS region

plotAvgProf2(files,TxDb = txdb, upstream = 3000, downstream = 3000, conf = 0.95,facet = "row")

![plotavg_list](https://user-images.githubusercontent.com/78794151/128011515-33ba61ba-f85c-47a7-96da-aeca86666714.png)

use plotGeneBody2 to draw a list of the TSS region

plotGeneBody2(files, type = "promoters", conf = 0.95,facet = "row", TxDb = txdb)

![plotgenebody_list](https://user-images.githubusercontent.com/78794151/128011621-5c5fb231-aeaf-4522-badd-f93b77d2936a.png)

4. the usage of ploting gene body

this is to draw the genebody

set the scaledlength = 8777 is to test this function

genebodymatrix <- getGenebodyMatrix(peak,weightCol = "V5", windows = genebody, scaledlength = 8777, binsize = 55)

plotGeneBody(gebodymatrix, conf=0.95, scaledlength = 8777, binsize = 55)


![plot_gene](https://user-images.githubusercontent.com/78794151/128011731-7e5f87b1-c9f7-4e2c-8c4f-8ac55c18bb34.png)
GuangchuangYu commented 3 years ago

有几个问题:

  1. promoter regions只有6000bp,你是如何切成了8000份?
  2. 8000是你切的份数,不是长度多少bp。特别是gene body,大家的长度各不相同,怎么都成了8000bp。你最好label的时候是以比例来(可以参考一下别的软件的实现。
  3. 我觉得你的参数是不是有点多而奇怪,难道不是一个box就完事了?

PS:在完善这个的同时,你可以着手开始尝试解决别的问题。