JuliaPlots / StatsPlots.jl

Statistical plotting recipes for Plots.jl
Other
437 stars 88 forks source link

Grouped ecdfplot(..., group=) plot emits different result from multiple ecdfplot(...)s for each group #460

Open tai opened 3 years ago

tai commented 3 years ago

I noticed StatsPlots.ecdfplot() emits unexpected plot when I use a group= parameter.

When I use 2 ecdfplot() calls to compare 2 groups, I get this:

df = DataFrame(v=1:100, ab=vcat(repeat(["A"], 50), repeat(["B"], 50)));
ecdfplot(df[df.ab .== "A", :v], label="A")
ecdfplot!(df[df.ab .== "B", :v], label="B")

image

These CDF plots are correct as group "A" has 1:50 and "B" has 51:100.

However, once I switch to a single ecdfplot(..., group=) call with a group= parameter, I get this for the same dataframe:

# Also,
# @df df ecdfplot(:v, group=:ab)
# emits the same plot
ecdfplot(df.v, group=df.ab)

image

This plot doesn't look right as what is expected is an independent CDF plot for each group, just like the former result.

I was struggling to figure out why I'm getting so similar CDF plots for 2 groups (with real data) I'm comparing, and ended up with above examples.

Following is the setup I'm on:

Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, broadwell)

[91a5bcdd] Plots v1.19.4
[f3b207a7] StatsPlots v0.14.26
sethaxen commented 3 years ago

Yes this occurs because of how ecdfplot is currently implemented as a user plot. When #453 is finished, it should fix this.