Open simonelsasser opened 4 years ago
Nice! I did not know about this option. Right now the plotting functions do not alter axis, it's up to the user to adjust that later on. I will keep this issue open as a reminder, will close it if implemented as an option, or if I include some quantile trimming as Rui was suggesting.
The outliers, which rarely come from the biological reads, compose of (G|C) simple repeats (sequencing errors).
It would expect to be removed before bamCoverage
in the pipeline (how I understand).
Trimming quantile will shift the outliers to the value of indicated quantile, which is a compromise if the source of error is unknown.
It's better to find out the reads artefacts. Last time I simply purged any coverage larger than 16 reads stack, since even for H3K4me3 the stacking is too rare to be >12 when the overlap of reads follows a Poisson distribution.
@cnluzon we had this discussion about shortcomings of ylim and also scale_y_continuous. That they draw a wrong box plot and give wrong statistics since the data not in the plot is also not used. I knew I came across a solution before, and it is
coord_cartesian(ylim=c(0, 3))
which only adjust the visible coordinates, not the underlying data. So if you implement that one can chose x or y axis limits, please use this implementation so that there are no mistakes made by unexperienced users.