ataudt / chromstaR

Combinatorial and differential ChIP-seq analysis in R
8 stars 3 forks source link

Error in plotFoldEnrichment #6

Closed cnova1126 closed 7 years ago

cnova1126 commented 7 years ago

Dear Aaron,

In addition to an issue with binning, I am also having a problem regarding the plotFoldEnrichment function. I have attached both the code and associated plot for both counts and combinations. I am not sure why H3K4me3 is displayed properly on the count-based plot but not on the combination-based plot? Thanks again for you time and assistance!

plotFoldEnrichment(hmm = model, annotation = genes, bp.around.annotation = 15000, what = counts) + ggtitle("Fold Enichment within Gene Body") + xlab('Distance from Gene Body in bp')

plotFoldEnrichment(hmm = model, annotation = genes, bp.around.annotation = 15000, what = combinations) + ggtitle("Fold Enichment within Gene Body") + xlab('Distance from Gene Body in bp')

combinationplot countplot

ataudt commented 7 years ago

Dear cnova,

thank you for the bug report. Can you tell me which version of chromstaR you are using? You can find this by typing sessionInfo() in R. I made some major changes to the code three weeks ago to implement new functionality and probably missed to change something in the plotting function.

Regards, Aaron

cnova1126 commented 7 years ago

Hi Aaron,

Sure no problem. chromstaR_1.0.0

In regards to my other post: I am also trying the new cell mark file and will let you know how it works. Thanks for your help!

Best, Chris

ataudt commented 7 years ago

Hi Chris,

Bioconductor has released a new version recently: http://bioconductor.org/packages/release/bioc/html/chromstaR.html This is chromstar 1.2.0 and might solve the issue that you observed. If you do install the new version, make sure to rerun the complete analysis from scratch. Also, the function plotFoldEnrichment() has been renamed to plotEnrichment(). See all changes here: http://bioconductor.org/packages/release/bioc/news/chromstaR/NEWS

Best, Aaron

cnova1126 commented 7 years ago

Hi Aaron,

I have updated chromstaR to version 1.2.0 and am still getting an error regarding combinatorial-plotting. While I get some errors, plotEnrichment seems to work with both counts and peaks, but not with combinations. I have attached the entire code in case I am making an error regarding obtaining annotations, etc.

load libraries

library(chromstaR) library(biomaRt)

set path to input and output folders

inputfolder <- ("/Volumes/TCGA_CCLE_SKCM/TCGA_CCLE_SKCM_Pipeline_Runs_Apr2017/SKCM_FINAL_RUNS/TCGA-CCLE_downsampled_BAM/") outputfolder <- ("/Volumes/TCGA_CCLE_SKCM/TCGA_CCLE_SKCM_Pipeline_Runs_Apr2017/SKCM_FINAL_RUNS/TCGA-CCLE_downsampled_BAM/TCGA-SKCM_chromstaR_Combo_RAS/")

read in cell mark file

H3K4me3_H3K27me3 <- read.table(file = "TCGA_SKCM_H3K4me3andH3K27me3_BRAF.txt", sep = "\t", header = TRUE)

run chromstaR using mode = 'combinatorial', 'differential', 'full'

Chromstar(inputfolder, experiment.table = H3K4me3_H3K27me3, outputfolder = outputfolder, mode = 'combinatorial', assembly = 'hg19')

load output files for downstream processing

model <- get(load(file.path(outputfolder,'multivariate', 'multivariate_mode-combinatorial_condition-RAS.RData')))

get genomicFrequencies

genomicFrequencies(model)

transition probability

heatmapTransitionProbs(model) + ggtitle('Transition Probability')

read count correlation

heatmapCountCorrelation(model) + ggtitle('Read Count Correlation')

Enrichment Analysis

obtain annotations

ENS <- useMart('ENSEMBL_MART_ENSEMBL') listDatasets(ENS) ENS <- useMart('ENSEMBL_MART_ENSEMBL', dataset = "hsapiens_gene_ensembl")

get genes

genes <- getBM(attributes = c('ensembl_gene_id', 'chromosome_name', 'start_position', 'end_position', 'strand', 'gene_biotype'), mart = ENS)

genes <- GRanges(seqnames = paste0('chr',genes$chromosome_name), ranges = IRanges(start = genes$start_position, end = genes$end_position), strand = genes$strand, biotype = genes$gene_biotype)

gene body plot

plotEnrichment(hmm = model, annotation = genes, bp.around.annotation = 15000, what = 'combinations') + ggtitle("Fold Enichment within Gene Body") + xlab('Distance from Gene Body in bp')

TSS plot

plotEnrichment(model, genes, region = 'start', what = 'combinations') + ggtitle("Fold Enichment Around TSS") + xlab('Distance from TSS in bp')

TSS facet plot

plotFoldEnrichment(model, genes, region = 'start') + facet_wrap(~ combination)

TSS heatmap

tss <- resize(genes, width = 3, fix = 'start') plotEnrichCountHeatmap(model, tss) + theme(strip.text.x = element_text(size = 6)) + ggtitle('Read counts around TSS')

Biotype Plot

biotype <- split(tss, tss$biotype) plotFoldEnrichHeatmap(model, annotations = biotype) + coord_flip() + ggtitle('Fold enrichment within Biotype')

count-based plot

bivalent_counts

peak-based plot

bivalent_peaks

combination-based plot

bivalent_combinations

I definitely have bivalent targets in this dataset as I have identified using other methods. Also, looking at genomic frequency there seems to be a high level of bivalent enrichment.

genomicFrequencies(model)

$frequency

            []          [H3K4me3]         [H3K27me3] [H3K4me3+H3K27me3] 
   0.681691040        0.066196097        0.005565173        0.246547690 

Thanks again for your help!

Best, Chris

ataudt commented 7 years ago

Hi Chris, I have checked the function and cannot find a bug. However, I have noticed the following things in your plots: H3K4me3-RAS-rep2 and H3K4me3-RAS-rep4 don't seem to have enrichment at TSS. Are these indeed replicates in condition RAS, or do they belong to a different condition? Also, you do have enrichment of bivalent domains at the TSS. However, the enrichment is quite weak. This is not an error, because what the plot shows is the ratio of observed over expected enrichment. In your case, you have a high abundance of bivalent domains, and they seem to be everywhere, not just at TSS, therefore the fold enrichment is quite weak.

Best, Aaron

cnova1126 commented 7 years ago

Hi Aaron, The example is from older data and I have subsequently removed these samples. And yes, I knew I had a large amount of bivalent domains in my dataset, but it is interesting to know the majority are not localized within the TSS or gene body. I will try these functions with another dataset (i.e Active Enhancers) and let you know how it works.

Thanks again! Best, Chris