JuliaStats / KernelDensity.jl

Kernel density estimators for Julia
Other
172 stars 40 forks source link

Add and subtract kerneldensities #93

Open mkborregaard opened 3 years ago

mkborregaard commented 3 years ago

Would it make sense to be able to do arithmetic on the kde objects, e.g. to add and subtract them, for comparing distributions?

tpapp commented 3 years ago

I am not quite sure it would make sense directly (with + and -). I am assuming that the underlying operation op you are thinking of is something like

kde(vcat(x, y)) ≈ op(kde(x), kde(y))

but even if we ignore technical complications (bandwidth selection etc) information is lost with the lengths of x and y (ie weights would be needed).

(Sorry if I misunderstood the question.)

mkborregaard commented 3 years ago

Not quite sure - to be exact this is what I wanted (I'll post my code, as it's essentially an MWE): I have the density of some variables (individual species plotted in a two-dimensional trait space after ordination) in two situations - before and after human disturbance of the ecosystem.

using CSV, Plots, DataFrames, StatsPlots, KernelDensity

after = CSV.read("Pcoa_AFTER_27jan21.txt", DataFrame)
before = CSV.read("Pcoa_BEFORE_27jan21.txt", DataFrame)

function pplot(obj)
    x, y = obj[!,"Axis.1"], obj[!,"Axis.2"]
    kd = kde((x,y))
    p = plot(kd, st = :contourf, color = :viridis, lc = :white, lw = 0.2)
    scatter!(p, x, y, msw = 0, color = :lightgrey, ms = 2, label = "")
    p
end

plot(
    pplot(before),
    pplot(after),
    size = (1000, 400), clim = (0, 60)
)

kdes What I'm interested in here is the difference surface between the two - what areas have denser distributions and what have less dense? So the result is not really a kde (as it doesn't sum to 1). Here's an example from ggplot2 of + instead of -. https://files.slack.com/files-pri/T68168MUP-F01KW2657PH/image.png

I'm not sure it makes 100% sense statistically, sorry - there was some discussion on Slack https://julialang.slack.com/archives/C6821M4KE/p1611916388016600

mkborregaard commented 3 years ago

But I guess the conclusion is that + and - does not make sense for this package.