deeptools / deepTools

Tools to process and analyze deep sequencing data.
Other
686 stars 213 forks source link

PlotProfile - Heatmap controversy #1081

Open ANDdna1991 opened 3 years ago

ANDdna1991 commented 3 years ago

Welcome to deepTools GitHub repository! Before opening the issue please check that the following requirements are met :

Dear Deeptools staff,

I really like Deeptools, it's an awesome exploratory tool, really thanks for it and your support. I extensively use it and I found the following issue that I can't explain, I was wondering whether you can bring some light to me.

I'm analysing a eU-seq (Nascent RNA) dataset and there is a clear difference between both samples (seen by browser and also in the heatmap result, please see the attached image). However, inexplicable the profile does not reflect such difference, indeed it shows more signal in the sample with lesser signal in the heatmap (in cluster 1), but also few differences in the others. How can it be explained? since I assume the same signal values are used for the heatmap and the profile, right?

I'm using deeptools 3.5.0.

Screenshot 2021-07-29 at 13 11 23

Thanks a lot, Andres

dpryan79 commented 3 years ago

Can you post the command you used? There are a variety of possible reasons for this and having the command will narrow them down.

ANDdna1991 commented 3 years ago

Hi, thanks for reply!

Here is the code (it's pretty standard).

computeMatrix scale-regions -S bigwig/SRR6177930.coverage.forward.bw bigwig/SRR6177931.coverage.forward.bw -R sorted_genes.bed --outFileName Eu_seq_IN_sorted_genes.matrix --regionBodyLength 100 --beforeRegionStartLength 500 --afterRegionStartLength 500 --binSize 5 --missingDataAsZero --numberOfProcessors 10

plotHeatmap --matrixFile Eu_seq_IN_sorted_genes.matrix--outFileName Eu_seq_IN_sorted_genes.matrix.heatmap.png --startLabel '' --endLabel '' --samplesLabel A B --plotTitle 'sorted genes'

I've being checking and the problem it's because the first genes in the cluster_1, but I really don't see a huge difference to explained that change in the scale between cluster 1 and 2.

Really Thanks, Andres

dpryan79 commented 3 years ago

@ANDdna1991 Can you try using the --averageTypeSummaryPlot median with plotProfile? I recall that there's some clipping that happens in the heatmap so outliers don't make the plots uninterpretable and I wonder if that's masking the huge change in the first gene that's driving the change in the mean level that you're seeing in the profile. The median should be much more robust to that.

ANDdna1991 commented 3 years ago

Thanks!

You're right, the median should be less affected by these outliers... Using --averageTypeSummaryPlot median, the plot looks "nicer" (see A), so thanks!! but I have found an issue with cluster option, using --kmeans the --averageTypeSummaryPlot median seems to be ignored and use the default(B).

(A)

Screenshot 2021-07-30 at 12 34 09

plotHeatmap --matrixFile Eu_seq_IN_sorted_genes.matrix--outFileName Eu_seq_IN_sorted_genes.matrix.heatmap.png --startLabel '' --endLabel '' --averageTypeSummaryPlot 'median' --sortRegion descend --samplesLabel A B --plotTitle 'sorted genes'

(B)

Screenshot 2021-07-30 at 12 34 19

plotHeatmap --matrixFile Eu_seq_IN_sorted_genes.matrix--outFileName Eu_seq_IN_sorted_genes.matrix.heatmap.png --startLabel '' --endLabel '' --averageTypeSummaryPlot 'median' --kmeans 3 --sortRegion descend --samplesLabel A B --plotTitle 'sorted genes'

Thanks, Andres

dpryan79 commented 3 years ago

Can you upload the matrix somewhere? It'll be easier to check into this with it.