JuliaPlots / StatsPlots.jl

Statistical plotting recipes for Plots.jl
Other
436 stars 88 forks source link

Plot of ecdf with groups gives incorrect plot #544

Open DrChainsaw opened 10 months ago

DrChainsaw commented 10 months ago

Sorry if this is the wrong repo. I couldn't easily figure out where the group keyword is implemented.

DataFrame(a = repeat([0,1], 100), b = repeat([0, 1], 100)) |> @df plot(ecdf(:a); group=:b)

Gives the following: image

Since a and b are the same sequence of 0,1,0,1... the correct plot is one vertical line at x=0 for b=0 and one vertical line at x=1 for b=1.

My guess is that there is some unintended broadcasting of a scalar which prevents the above from erroring out since ecdf of :a returns a single ECDF struct with all the data. I suppose one would get the exact same issue for any other function which does returns a plottable object which is not a vector.

Perhaps the above can be made to work if the grouping happens before any function is applied, but throwing an error would also be fine I guess.

sethaxen commented 10 months ago

IIRC the problem here (and with a few other oddities of ecdfplot) is that it's defined as a user plot and not a series type, which greatly limits how it can be used. I implemented a series type version https://github.com/arviz-devs/ArviZPlots.jl/blob/main/src/ecdfplot.jl that should probably replace this package's implementation.

DrChainsaw commented 10 months ago

Can confirm that the implementation of ecdfplot above works for the example above. I guess it is straight forward to generalize the recipes to also work for plot(::ECDF,..), right?

sethaxen commented 10 months ago

I suspect it would be straightforward, but I haven't thought yet about how to do it.