YuLab-SMU / ChIPseeker

:dart: ChIP peak Annotation, Comparison and Visualization
https://onlinelibrary.wiley.com/share/author/GYJGUBYCTRMYJFN2JFZZ?target=10.1002/cpz1.585
219 stars 74 forks source link

add new function and fix bugs #190

Closed MingLi-929 closed 2 years ago

MingLi-929 commented 2 years ago

update 2022.6.2

(1) combine gr and txdb parameter: users can pass self-made gr to txdb. (2) combine plotPeakProf3() , plotPeakProf2() and plotPeakProf3() : Users can use plotPeakProf() to call plotPeakProf2() and plotPeakProf3(). The usage lay below. -------------------------------------------------------------------------------> This pull request do following things: (1) fix several bugs (2) perfect the annotation of functions (3) add test files for getTagMatrix() and plotTagMatrix() (4) getBioRegion() can support UTR regions(3'UTR + 5'UTR), according to #189 (5) add new function makeBioRegionFromGranges() to support make windows from self-made granges object, according to #189.

  1. getBioRegion() can support UTR regions(3'UTR + 5'UTR)
    
    library(TxDb.Hsapiens.UCSC.hg19.knownGene)
    library(ChIPseeker)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene threeUTR <- getBioRegion(TxDb = txdb, by = "3UTR", type = "body")

length(threeUTR)

fiveUTR <- getBioRegion(TxDb = txdb, by = "5UTR", type = "body")

length(fiveUTR)

UTR <- getBioRegion(TxDb = txdb, by = "UTR", type = "body")

length(UTR) == length(threeUTR) + length(fiveUTR) [1] TRUE

Now,  `getBioRegion()` support UTR regions

2. add new function `makeBioRegionFromGranges()`
According to #189, users will need to investigate insulator or enhancer regions, which can not be attained through `txdb` object. We add new function `makeBioRegionFromGranges()` to support users investigate insulator or enhancer regions by using self-made granges object.

These self-made granges object served the same role as txdb object. For example, users can prepare a granges object containing enhancer regions. And `makeBioRegionFromGranges()` can use these self-made granges object to produce the windows("start_site", "end_site" and "body")

we consider transcript region as enhancer region

and make self-made granges object

they can be the same in the form of granges object

enhancer <- transcripts(txdb)[1:5000,]

we test three kinds of region, start_site, end_site and body

enhancer_body <- makeBioRegionFromGranges(gr = enhancer, by = "enhancer", type = "body")

enhancer_start <- makeBioRegionFromGranges(gr = enhancer, by = "enhancer", type = "start_site", upstream = 1000, downstream = 1000)

enhancer_end <- makeBioRegionFromGranges(gr = enhancer, by = "enhancer", type = "end_site", upstream = 1000, downstream = 1000)

Also, we modify the `getTagMatrix()` to call `makeBioRegionFromGranges()`.

previous getTagMatrix() can only use txdb object to produce function

now it support the self-made granges object through gr parameter

peak <- getSampleFiles()[[4]] txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

make the window by makeBioRegionFromGranges()

enhancer <- transcripts(txdb)[1:5000,]

enhancer_body <- makeBioRegionFromGranges(gr = enhancer, by = "enhancer", type = "body")

input the window made from self-made granges object

mt1 <- getTagMatrix(peak = peak, windows = enhancer_body, weightCol = "V5", nbin = 800)

without the window parameter

input the self-made granges object directly

mt2 <- getTagMatrix(peak = peak, gr = enhancer, weightCol = "V5", by = "enhancer", type = "body", upstream = rel(0.2), downstream = rel(0.2), nbin = 800)

input self-made granges object to make the start_site windows

mt3 <- getTagMatrix(peak = peak, weightCol = "V5", gr = enhancer, by = "gene", type = "start_site", upstream = 1000, downstream = 1000)

`plotPeakProf2()` is also adjusted to support self-made granges object

plotPeakProf2(peak = peak, by = "gene", type = "body", gr = enhancer)

plotPeakProf2(peak = peak, by = "gene", type = "body", gr = enhancer, upstream = rel(0.2), downstream = rel(0.2), nbin = 800)

yuanlizhanshi commented 2 years ago

Sorry to bother you. Could it be possible to accept two gr object inputs in plotPeakProf2, I want to compare the distribution of H3K4me1 in the enhancer region and non-enhancer region.

MingLi-929 commented 2 years ago

According to issue#192, plotPeakProf3() is created to received two or more windows. It can not only received self-made granges object, but also supported to use windows from txdb at the same time. In a word, Users can use self-made granges object to compare with the windows from txdb.

PS: this function do not support to compare region in different types. It should all be one of "start_site", "end_site" and "body".

the usage lay below:

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(ChIPseeker)
library(GenomicRanges)

peak <- getSampleFiles()[[4]]
peak_list <- getSampleFiles()[4:5]
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

## accept two granges object
## compare the peak in enhancer regions and non-enhancer

## self-made enhancer region in the form of granges object
enhancer <- transcripts(txdb)[1:5000,]

## self-made non-enhancer region in the form of granges object
non_enhancer <- unlist(fiveUTRsByTranscript(txdb))[1:5000]

gr <- list(enhancer,non_enhancer)

plotPeakProf3(peak = peak,
              conf = 0.95,
              gr = gr,
              by = c("enhancer","non-enhancer"),
              windows_name = c("enhancer","non-enhancer"),
              weightCol = "V5",
              type = "start_site",
              upstream = 1000,
              downstream = 1000)

1

## users can remove the ribbon plot by omit the conf parameter
plotPeakProf3(peak = peak,
              gr = gr,
              by = c("enhancer","non-enhancer"),
              windows_name = c("enhancer","non-enhancer"),
              weightCol = "V5",
              type = "start_site",
              upstream = 1000,
              downstream = 1000)

2

## you can also input a list of peaks
plotPeakProf3(peak = peak_list,
              gr = gr,
              conf = 0.95,
              by = c("enhancer","non-enhancer"),
              windows_name = c("enhancer","non-enhancer"),
              weightCol = "V5",
              type = "start_site",
              upstream = 1000,
              downstream = 1000)

3

## if users want to compare the regions obtained 
## from txbd and self-made granges object
## uses can input gr and txdb in the same time
plotPeakProf3(peak = peak,
              gr = gr,
              conf = 0.95,
              by = c("enhancer","gene"),
              windows_name = c("enhancer","gene"),
              weightCol = "V5",
              type = "start_site",
              upstream = 1000,
              downstream = 1000,
              TxDb = txdb)

4

## check the body region
plotPeakProf3(peak = peak,
              gr = gr,
              conf = 0.95,
              by = c("enhancer","gene"),
              windows_name = c("enhancer","gene"),
              weightCol = "V5",
              type = "body",
              upstream = 1000,
              downstream = 1000,
              TxDb = txdb,
              nbin = 800)

5

plotPeakProf3(peak = peak,
              gr = gr,
              conf = 0.95,
              by = c("enhancer","gene"),
              windows_name = c("enhancer","gene"),
              weightCol = "V5",
              type = "body",
              TxDb = txdb,
              nbin = 800)

6

plotPeakProf3(peak = peak,
              gr = gr,
              conf = 0.95,
              by = c("enhancer","gene"),
              windows_name = c("enhancer","gene"),
              weightCol = "V5",
              type = "body",
              upstream = rel(0.2),
              downstream = rel(0.2),
              TxDb = txdb,
              nbin = 800)

7

plotPeakProf3(peak = peak_list,
              gr = gr,
              conf = 0.95,
              by = c("enhancer","gene"),
              windows_name = c("enhancer","gene"),
              weightCol = "V5",
              type = "body",
              upstream = rel(0.2),
              downstream = rel(0.2),
              TxDb = txdb,
              nbin = 800)

8

yuanlizhanshi commented 2 years ago

Thank you very much.

Wish you all the best in your scientific research.

GuangchuangYu commented 2 years ago

the features are good, but I don't like fun1, fun2, fun3, etc.

We need to design a better API.

For the gr, passing the gr to the txdb parameter is OK, as we did this in annotatePeak.

MingLi-929 commented 2 years ago

update usage lay below:

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(ChIPseeker)
library(GenomicRanges)

peak <- getSampleFiles()[[4]]
peak_list <- getSampleFiles()[4:5]
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

## accept two granges object
## compare the peak in enhancer regions and non-enhancer

## self-made enhancer region in the form of granges object
enhancer <- transcripts(txdb)[1:5000,]

## self-made non-enhancer region in the form of granges object
non_enhancer <- unlist(fiveUTRsByTranscript(txdb))[1:5000]

plotPeakProf(peak = peak,
             conf = 0.95,
             by = c("enhancer","non-enhancer"),
             windows_name = c("enhancer","non-enhancer"),
             weightCol = "V5",
             type = "start_site",
             upstream = 1000,
             downstream = 1000,
             TxDb = list(enhancer,non_enhancer))

## users can remove the ribbon plot by omit the conf parameter
plotPeakProf(peak = peak,
             TxDb = list(enhancer,non_enhancer),
             by = c("enhancer","non-enhancer"),
             windows_name = c("enhancer","non-enhancer"),
             weightCol = "V5",
             type = "start_site",
             upstream = 1000,
             downstream = 1000)

## you can also input a list of peaks
plotPeakProf(peak = peak_list,
             TxDb = list(enhancer,non_enhancer),
             conf = 0.95,
             by = c("enhancer","non-enhancer"),
             windows_name = c("enhancer","non-enhancer"),
             weightCol = "V5",
             type = "start_site",
             upstream = 1000,
             downstream = 1000)

## if users want to compare the regions obtained 
## from txbd and self-made granges object
## uses can input gr and txdb in the same time
plotPeakProf(peak = peak,
             TxDb = list(enhancer,txdb),
             conf = 0.95,
             by = c("enhancer","gene"),
             windows_name = c("enhancer","gene"),
             weightCol = "V5",
             type = "start_site",
             upstream = 1000,
             downstream = 1000)

## check the body region
plotPeakProf(peak = peak,
             TxDb = list(enhancer,txdb),
             conf = 0.95,
             by = c("enhancer","gene"),
             windows_name = c("enhancer","gene"),
             weightCol = "V5",
             type = "body",
             upstream = 1000,
             downstream = 1000,
             nbin = 800)

plotPeakProf(peak = peak,
             TxDb = list(enhancer,txdb),
             conf = 0.95,
             by = c("enhancer","gene"),
             windows_name = c("enhancer","gene"),
             weightCol = "V5",
             type = "body",
             nbin = 800)

plotPeakProf(peak = peak,
             TxDb = list(enhancer,txdb),
             conf = 0.95,
             by = c("enhancer","gene"),
             windows_name = c("enhancer","gene"),
             weightCol = "V5",
             type = "body",
             upstream = rel(0.2),
             downstream = rel(0.2),
             nbin = 800)

plotPeakProf(peak = peak_list,
             TxDb = list(enhancer,txdb),
             conf = 0.95,
             by = c("enhancer","gene"),
             windows_name = c("enhancer","gene"),
             weightCol = "V5",
             type = "body",
             upstream = rel(0.2),
             downstream = rel(0.2),
             nbin = 800)

plotPeakProf(peak = peak,
             upstream = 1000,
             downstream = 1000,
             by = "gene",
             type = "start_site",
             TxDb = txdb,
             nbin = 800)

# test a list of peak files
plotPeakProf(peak = peak_list,
             upstream = 1000,
             downstream = 1000,
             by = "gene",
             type = "start_site",
             TxDb = txdb,
             nbin = 800)
# test body region
# without extension
plotPeakProf(peak = peak_list,
             by = "gene",
             type = "body",
             TxDb = txdb,
             nbin = 800)

# extend with rel object
plotPeakProf(peak = peak_list,
             by = "gene",
             type = "body",
             TxDb = txdb,
             upstream = rel(0.2),
             downstream = rel(0.2),
             nbin = 800)

# extend with actual number
plotPeakProf(peak = peak_list,
             by = "gene",
             type = "body",
             TxDb = txdb,
             upstream = 1000,
             downstream = 1000,
             nbin = 800)

plotPeakProf(peak = peak,
             upstream = 1000,
             downstream = 1000,
             by = "gene",
             type = "start_site",
             TxDb = enhancer,
             nbin = 800)

# test a list of peak files
plotPeakProf(peak = peak_list,
             upstream = 1000,
             downstream = 1000,
             by = "gene",
             type = "start_site",
             TxDb = enhancer,
             nbin = 800)

# test body region
# without extension
plotPeakProf(peak = peak_list,
             by = "gene",
             type = "body",
             TxDb = enhancer,
             nbin = 800)

# extend with rel object
plotPeakProf(peak = peak_list,
             by = "gene",
             type = "body",
             TxDb = enhancer,
             upstream = rel(0.2),
             downstream = rel(0.2),
             nbin = 800)

# extend with actual number
plotPeakProf(peak = peak_list,
             by = "gene",
             type = "body",
             TxDb = enhancer,
             upstream = 1000,
             downstream = 1000,
             nbin = 800)