deeptools / deepTools

Tools to process and analyze deep sequencing data.
Other
668 stars 205 forks source link

--xxxThreshold for plotHeatmap? #359

Closed MWSchmid closed 8 years ago

MWSchmid commented 8 years ago

I think that it would be nice to have the "--minThreshold" and "--maxThreshold" arguments in plotHeatmap as well. computeMatrix with multiple samples can take quite a long time and being able to set the thresholds only during plotting would help to find a senseful value quicker (to kick out abnormal regions).

Note aside - I assume that the sorting in plotHeatmap uses all data/samples for a given region at once, but somehow it isn't entirely clear to me from the documentation whether this is true (the alternative would be that each sample is sorted separately - meaning that one line along the plot does not reflect one region, but many).

dpryan79 commented 8 years ago

Hi Marc,

The --zMin and --zMax options largely do this already.

Regarding sorting, yes it uses all samples in each group at once, otherwise the samples wouldn't get sorted together and the resulting plot would be difficult to interpret.

Devon

Devon Ryan, Ph.D. Email: dpryan@dpryan.com Data Manager/Bioinformatician Max Planck Institute of Immunobiology and Epigenetics Stübeweg 51 79108 Freiburg Germany devon.ryan@dzne.de

On Fri, May 13, 2016 at 8:45 AM, Marc notifications@github.com wrote:

I think that it would be nice to have the "--minThreshold" and "--maxThreshold" arguments in plotHeatmap as well. computeMatrix with multiple samples can take quite a long time and being able to set the thresholds only during plotting would help to find a senseful value quicker (to kick out abnormal regions).

Note aside - I assume that the sorting in plotHeatmap uses all data/samples for a given region at once, but somehow it isn't entirely clear to me from the documentation whether this is true (the alternative would be that each sample is sorted separately - meaning that one line along the plot does not reflect one region, but many).

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/fidelram/deepTools/issues/359

MWSchmid commented 8 years ago

Hi Devon

Right - but the range of the heatmap legend/values seems to be quite senseful anyway - even with extreme outliers (i.e. regions with very high coverage at few spots) - in contrast to the profile where the outliers cause quite massive peaks. --yMin and --yMax are an option as well, but removing them entirely would be nicer as the profile gets quickly uninformative if there several "outlier-peaks".

-.-

I guess that setting --averageTypeSummaryPlot to median instead of mean would solve the problem as well (as long as the coverage is high enough)...

^^

Best regards,

Marc

dpryan79 commented 8 years ago

If median doesn't suffice then instead filter the output of computeMatrix. This very fast. You can find examples of a script for doing that in the develop branch under scripts/subsetMatrix. There's a much larger version of that in the feature/riboseq_352 if you need further examples.