deeptools / deepTools

Tools to process and analyze deep sequencing data.
Other
668 stars 205 forks source link

To understand "plotProfile" output. #959

Closed Swadha22 closed 4 years ago

Swadha22 commented 4 years ago

Python 3.6.10 deeptools 3.1.3

Hi, I have ChIP data of two of the histone variants (HTZ-1 and HTAS-1) of C.elegans. I want to see if there is any enrichment around the promoter or TSS or gene body region. To do that, these are the steps I followed:

1- I first mapped my reads to the reference genome using BOWTIE (as it was single-end, 50-BP long reads and I wanted to use -m 2 parameter). Percentage of the reads mapped were ~50% (mapping rate is pretty low). 2- I then used bamCoverage to convert bam files to bedgraph files with a binsize one. This was the command I used: bamCoverage -of bedgraph --bam $line -o $line.bedgraph --binSize 1 3-Converted the bedgraphs to bigwigs "bedGraphToBigWig $line ce.chrome.sizes $line.bw" 4- I then used computeMatrix scale region and reference point both to see the enrichments.

computeMatrix scale-regions --beforeRegionStartLength 1000 --afterRegionStartLength 5000 -R Caenorhabditis_elegans.WBcel235.94.gtf -S HTAS1.C.sam.bedgraph.bw HTZ1.sam.bedgraph.bw Input.S41B..sam.bedgraph.bw -o Replication_1 --binSize 1 --startLabel TSS --endLabel 5000 --samplesLabel HTAS-1.C HTZ-1 Input

plotProfile -m Replication_1 -out Replication_1.png --perGroup --colors red yellow blue --plotWidth 20 image

computeMatrix reference-point-R Caenorhabditis_elegans.WBcel235.94.gtf -S HTAS1.C.sam.bedgraph.bw HTZ1.sam.bedgraph.bw Input.sam.bedgraph.bw --referencePoint TSS -b 1000 -a 5000 --binSize 1 --samplesLabel HTAS-1.C HTZ-1 Input -o Replication_1

plotProfile -m Replication_1 -out Replication_1.png --perGroup --colors red yellow blue --plotTitle "Rep-1" --plotWidth 20 image

Both scale region and reference point are giving me similar results.

QUESTION: From "issue 682" I understand that each region I have given in the command is divided by the "bin size" and each of the per-base values are extracted in the bins and scaled and an average is computed inside each bin.

I want your help in interpreting the graphs I generated. What I had imagined was the "Input" graph will show minimum enrichments and it will be at the bottom of the plot and would be able to compare the enrichments/incorporation of the histones HTAS-1.C and HTZ-1 in the genome with respect to the TSS and gene body. What does the Y-axis mean? Do my graphs show that HTAS-1.C is three times more enriched at the TSS and gene body as compared to the HTZ-1?

Did I do something wrong while generating the graphs? I have three biological replicates of this ChIP data and all of them are crapy. Do you have any suggestions on how to analyze them efficiently? Any kind of help will be highly appreciated.

I like the example plot (snapshot below) you have shown on your webpage where you can compare the enrichment of the three histone marks. I want to make a similar kind of plot for data. I feel like, in my data, the graphs are not scaled together so I am not able to compare their enrichments. image

Thanks -S

dpryan79 commented 4 years ago

You can directly create bigWig files from bamCoverage, this will be simpler and more flexible than relying on UCSC.

You haven't specified any particular normalization with bamCoverage, so you're just seeing average coverage, which will not be useful for you. You would be better off using RPGC normalization to normalize to 1X average coverage, which should put the various samples at similar overall background coverage levels. Better still would be to normalize your ChIP samples to the input sample with bamCompare.

The Y axis in the plots is whatever the bigWig files contains, nothing more. In your case the files contain average coverage, so that'd be a reasonable axis label.

Swadha22 commented 4 years ago

Hey, Thanks so much for getting back! This makes so much sense now. I am normalizing my ChIP samples to the input (using bamCompare).

Do you think I will need to normalize the samples again (to 1X or Z-score normalization) to put the samples at a similar overall background? Or just normalizing the samples to the input will be fine?

Thanks S

dpryan79 commented 4 years ago

Normalizing to input should suffice.