YinLiLin / CMplot

📊 Circular and Rectangular Manhattan Plot
520 stars 112 forks source link

Multi-trait Manhattan plot: Reverse point sampling #64

Closed mschilli87 closed 2 years ago

mschilli87 commented 2 years ago

When plotting multiple traits in a single Manhattan plot, overplotting is unavoidable. CMplot dampens its effect by randomly sampling 1000 points from each trait to plot at a time. However, it does so starting with all traits in the first chunk of points to plot and removes traits from the sampling (and thus plotting) list once they have been 'depleted' (i.e. all points for that trait have been plotted). Thus, if one trait has much more (non-NA) points that the other(s), it can dominate the last chunk of points plotted resulting in visible overplotting 'bias' towards that sample.

This PR addresses this issue by inverting the plot (or rather: sampling) order: 'Larger' traits are preferred in the initial chunk(s) of points to plot until equal numbers of points remain to be plotted for each trait. This way, the 'extra' points accumulate in the background instead of the foreground, removing the visible 'bias' caused by the overplotting.

Here is an example:

# Get latest development version of `CMplot` and `pig60K` example data.
library(CMplot)
source("https://raw.githubusercontent.com/YinLiLin/CMplot/fe3b0ed0130bac60d61cb23aaec778435c8d1bce/R/CMplot.r")
data(pig60K)

# Create mulit-tracks Manhattan plot with default parameters (for reference).
set.seed(42)
CMplot(pig60K, plot.type="m", multracks=TRUE, file.output=FALSE)

multi-traits Manhattan plot using unmodified example data and current master branch

# Randomly drop p-values for most points in two out of three traits.
set.seed(42)
pig60Kmod <- pig60K
n <- nrow(pig60Kmod)
na1 <- sample(1:n, as.integer(.8 * n))
na3 <- sample(1:n, as.integer(.95 * n))
pig60Kmod$trait1[na1] <- NA
pig60Kmod$trait3[na3] <- NA

# Observe 'larger' trait visually dominate the plot.
set.seed(42)
CMplot(pig60Kmod, plot.type="m", multracks=TRUE, file.output=FALSE)

multi-traits Manhattan plot using modified example data and current master branch

# Repeat the same plot with reversed sampling order as suggested in
# this PR.
source("https://raw.githubusercontent.com/YinLiLin/CMplot/a62c829fea8c4d74b609fcefb3dd8a73895ade26/R/CMplot.r")
set.seed(42)
CMplot(pig60Kmod, plot.type="m", multracks=TRUE, file.output=FALSE)

multi-traits Manhattan plot using modified example data and coude suggested here

# Repeat plot with unmodified example data to rule out any unwanted
# side-effects of the reversed sampling.
set.seed(42)
CMplot(pig60K, plot.type="m", multracks=TRUE, file.output=FALSE)

multi-traits Manhattan plot using unmodified example data and coude suggested here

YinLiLin commented 2 years ago

Thank you very much for your time and efforts on updating CMplot. Pretty helpful, it's now merged with the main branch.