LKremer / ggpointdensity

:chart_with_upwards_trend: :bar_chart: Introduces geom_pointdensity(): A Cross Between a Scatter Plot and a 2D Density Plot.
GNU General Public License v3.0
411 stars 25 forks source link

What is the default bandwidth? #5

Closed ecalfee closed 4 years ago

ecalfee commented 5 years ago

I'm not quite understanding the documentation here "This includes adjust, a multiplicate bandwidth adjustment used to adjust the distance threshold to consider two points as neighbors, i.e. the radius around points in which neighbors are counted. For example, adjust = 0.5 means use half of the default." If I want to know the radius for which neighbors were calculated, what should I multiply the value 'adjust' by, i.e. how do I know what 'default' was used? Alternatively, can I set the bandwidth directly as a fixed value in the units of my plot axes (e.g. # neighbors within 1 meter radius if I were plotting points on a map)? For some plots, where X and Y have the same units, this would make the n_neighbors more intuitive.

LKremer commented 5 years ago

Hi there, in the case where both x- and y-axis are on the same scale, the default radius is currently (xrange + yrange) / 70. xrange and yrange are the plot limits, not the range of the data. The division by 70 is pretty ad-hoc but seems reasonable.

It gets a little complicated when x and y are on completely different scales (e.g. when x ranges from 0 to 10 and y ranges from 0 to 10000). Of course then it wouldn't make a whole lot of sense to calculate points within some radius, because a circle would be skewed and look oval in the plot space. So when calculating the point distances, I fiddled around a little and added a correction factor to weight x and y-values differently in the distance calculation.

If there is demand for this feature, I could add an option to set a fixed radius as you suggested.