Closed ecalfee closed 4 years ago
Hi there,
in the case where both x- and y-axis are on the same scale, the default radius is currently (xrange + yrange) / 70
. xrange
and yrange
are the plot limits, not the range of the data. The division by 70 is pretty ad-hoc but seems reasonable.
It gets a little complicated when x and y are on completely different scales (e.g. when x ranges from 0 to 10 and y ranges from 0 to 10000). Of course then it wouldn't make a whole lot of sense to calculate points within some radius, because a circle would be skewed and look oval in the plot space. So when calculating the point distances, I fiddled around a little and added a correction factor to weight x and y-values differently in the distance calculation.
If there is demand for this feature, I could add an option to set a fixed radius as you suggested.
I'm not quite understanding the documentation here "This includes adjust, a multiplicate bandwidth adjustment used to adjust the distance threshold to consider two points as neighbors, i.e. the radius around points in which neighbors are counted. For example, adjust = 0.5 means use half of the default." If I want to know the radius for which neighbors were calculated, what should I multiply the value 'adjust' by, i.e. how do I know what 'default' was used? Alternatively, can I set the bandwidth directly as a fixed value in the units of my plot axes (e.g. # neighbors within 1 meter radius if I were plotting points on a map)? For some plots, where X and Y have the same units, this would make the n_neighbors more intuitive.