LKremer / ggpointdensity

:chart_with_upwards_trend: :bar_chart: Introduces geom_pointdensity(): A Cross Between a Scatter Plot and a 2D Density Plot.
GNU General Public License v3.0
411 stars 25 forks source link

Normalized density for facetted plots #1

Closed seasmith closed 4 years ago

seasmith commented 5 years ago

Would it make sense to create a computed statistic to show the normalized/relative number of neighbors per group to the max nearest neighbors?

# i.e.
data$r_neighbors <- data$n_neighbors / max(data$n_neighbors)
LKremer commented 5 years ago

Yes it would make sense in some cases! I wouldn't want to make this the default behavior because I think raw neighbor counts are a bit more intuitive than relative ones, but for facetted plots I see how it can be useful. Maybe something like geom_pointdensity(relative=TRUE) would be worthwhile?

seasmith commented 5 years ago

Returning a computed stat would bring the function's behavior inline with other ggplot2 functions (i.e. stat_density_2d returns both density, ndensity, level, and nlevel).

# Example
library(ggplot2)
library(ggpointdensity)

ggplot(diamonds, aes(carat, price)) +
  geom_pointdensity(aes(color = stat(r_neighbors)))

I feel density and ndensity are more inline with ggplot2 and would make the function more extendible (i.e. if the function accepted something like method = "kde2d" for 2d kernel-density or method = "bkde2d" for 2d binned kernel-density).

# Example

# Default method would be 'nn'
ggplot(diamonds, aes(carat, price)) +
  geom_pointdensity(aes(color = stat(ndensity)), method = "nn")

# kernel-density
ggplot(diamonds, aes(carat, price)) +
  geom_pointdensity(aes(color = stat(ndensity)), method = "bkde2d")

# binned kernel-density
ggplot(diamonds, aes(carat, price)) +
  geom_pointdensity(aes(color = stat(ndensity)), method = "bkde2d")
LKremer commented 5 years ago

Returning a computed stat would bring the function's behavior inline with other ggplot2 functions (i.e. stat_density_2d returns both density, ndensity, level, and nlevel).

This is already the case. stat_pointdensity computes a stat called n_neighbors. I just realized you can even use this stat to plot the density as you originally proposed:

ggplot(dat, aes(x = x, y = y, color = stat(n_neighbors) / max(n_neighbors))) +
    geom_pointdensity() +
    scale_color_viridis()

I could tweak the stat_pointdensity to return both n_neighbors and the density for convenience.

Regarding your last suggestion with method = "something", I'm experimenting with something like this at the moment. Mostly to test out different algorithms to find an efficient one that can handle many points (issue #2).

LKremer commented 4 years ago

This was implemented in @bjreisman's recent pull request #8 , so I'm closing.