JuliaStats / KernelDensity.jl

Kernel density estimators for Julia
Other
172 stars 40 forks source link

Interface of KernelDensity.pdf is not consistent with Distributions.pdf #120

Open jaksle opened 5 months ago

jaksle commented 5 months ago

I've noticed KernelDensity.pdf cannot be used in the same way as Distributions.pdf. I think this is very confusing for users and is a potential source of errors. In particular, pdf.(kde::UnivariateKDE, x::Vector) and pdf(kde::BivariateKDE, M::Matrix) do not work.

I could try adding them if there is agreement it should be done.

tpapp commented 5 months ago

Yes, it is unfortunate that the package defines a pdf method that works on vectors. This predates broadcasting (I think). It would be best to remove pdf(ik::InterpKDE,xs::AbstractVector) & friends and make things work with broadcasting.

As for pdf(kde::BivariateKDE, M::Matrix), I would very much prefer the solution as above combined with eachcol and/or eachrow. There is no preferred/canonical way to iterate over vectors in matrices in Julia, it is best to be explicit.

jaksle commented 5 months ago

I will look at this. Leaving it as it is would be asking for problems later, e.g. when more dimensional kde would be implemented. I'd strictly comply with the Distributions.jl interface.

jaksle commented 5 months ago

This is actually simpler than I thought. The pdf method for vectors was not a problem. The problem was broadcast had no information about the types UnivariateKDE and InterpKDE. You only need to add information they are scalars

Base.broadcastable(s::InterpKDE) = Ref(s)

the same for UnivariateKDE. But then, the broadcast is very slow as it calculates interpolation for each point separately. Fortunately, fixing it is even simpler, but requires a breaking change: making InterpKDE non-mutable. Then the compiler knows making those interpolations is the same call and broadcast becomes fast, actually 50% faster than the old vector method!

I'd like to stress now I see no reasonable reason as why InterpKDE was mutable. It seems completely useless trying to modify it. The same for other KDE types; changing that behaviour is not required for this problem, but I suspect this can cause efficiency problems in other circumstances.

Adding matrix argument was done in pull request #102. Together this would make KernelDensity comply with Distributions.