JuliaML / MLPlots.jl

Plotting recipes for statistics and machine learning using Plots.jl
Other
24 stars 5 forks source link

ROC using ROCAnalysis #7

Closed diegozea closed 8 years ago

diegozea commented 8 years ago

ROC curves for MLPlots! :D

image

Evizero commented 8 years ago

nice!

Evizero commented 8 years ago

It would be nice to see the AUC too. you can get n (which is what I think is meant with plt.n) by get(d, :n, 0) == 0

tbreloff commented 8 years ago

Cool. Is there a reason that you build the vectors by pushing? For example, I would do the following to set the line styles in one line:

get!(d, :linestyle, [:solid :dash])

Similarly linealpha and xargs/yargs can be one-liners.

Finally, it should be unnecessary (and possibly unwanted) to set the xlim/ylim args.

Let me know your thoughts... Thanks for the PR.

On Dec 16, 2015, at 3:57 PM, Diego Javier Zea notifications@github.com wrote:

ROC curves for MLPlots! :D

You can view, comment on, or merge this pull request online at:

https://github.com/tbreloff/MLPlots.jl/pull/7

Commit Summary

ROC using ROCAnalysis File Changes

M src/MLPlots.jl (3) A src/ROCAnalysis/roc.jl (37) Patch Links:

https://github.com/tbreloff/MLPlots.jl/pull/7.patch https://github.com/tbreloff/MLPlots.jl/pull/7.diff — Reply to this email directly or view it on GitHub.

tbreloff commented 8 years ago

Fyi... The "argument" n refers to the number of subplots, the "field" n refers to the number of series. I should really change that... Especially since the Subplot type has a field p which is the number of subplots. Oops. (My excuse... The field n came way before the argument n in the dev cycle, and until now I didn't care because its just internals) so your suggestion of "get(d, :n, 0)" won't work.

On Dec 16, 2015, at 4:52 PM, Christof Stocker notifications@github.com wrote:

It would be nice to see the AUC too. you can get n (which is what I think is meant with plt.n) by get(d, :n, 0) == 0

— Reply to this email directly or view it on GitHub.

tbreloff commented 8 years ago

I merged... I'm going to make a couple changes for brevity and push them up in a bit.

tbreloff commented 8 years ago

I cleaned up a bit, added the annotation back in, and removed the xlim/ylim lines.

I'm a little confused... why were you checking for plt.n == 0? Are you adding this to an existing plot in your workflow?

Anyways here's what it looks like for me in pyplot:

tmp

If you disagree with any of my changes, don't hesitate to yell at me.

diegozea commented 8 years ago

Thanks for merging and cleaning up my PR.

You're right, the problem is when I try to add a plot over an existing one, the AUC labels look awful as in the example below.

image

I tried to use plt.n == 0 to avoid the AUC label. However, I think that a keyword argument auc would be a better solution (using by default auc=true).

tbreloff commented 8 years ago

Thanks for the example. I see the issue, of course I think it's misleading to have the AUC annotation there at all if you're overlaying curves.

What do you think about the following alternative:

tmp

By the way... the "current plot" is saved automatically and filled in when missing, similar to matlab or pylab, so in your example, this:

plt = plot(roc1);
plot!(plt, roc2)

can be:

plot(roc1)
plot!(roc2)
tbreloff commented 8 years ago

I just pushed up these changes, as well as adding an optional auc arg:

plot(roc1)  # adds annotation
plot!(roc2, false)  # omits annotation
Evizero commented 8 years ago

This looks nice. I think over-plotting of the AUC annotations is something that can not easily be avoided without some clever formatting heuristics.

diegozea commented 8 years ago

I like the changes you made to the plots. Thanks!

diegozea commented 8 years ago

@tbreloff plot fails with a Vector{ROCAnalysis.Roc{T<:Real}} :

julia> using ROCAnalysis, MLPlots

julia> rocs = [ roc(i*rand(100), rand(100)) for i in 1:5 ];

julia> plot(rocs)
[Plots.jl] Initializing backend: pyplot
ERROR: type Roc has no field ptar
 in computeXandY at /home/diego/.julia/v0.4/Plots/src/plot.jl:273
 in createKWargsList at /home/diego/.julia/v0.4/Plots/src/plot.jl:308
 in createKWargsList at /home/diego/.julia/v0.4/Plots/src/plot.jl:352
 in plot! at /home/diego/.julia/v0.4/Plots/src/plot.jl:96
 in plot at /home/diego/.julia/v0.4/Plots/src/plot.jl:56

I expected a behaviour like this:

julia> for i in 1:length(rocs)
           if i == 1
               plot(rocs[i])
           else
               plot!(rocs[i])
           end
       end

Should it be implemented with a definition of Plots._apply_recipe(d::Dict, roc::AbstractVector{ROCAnalysis.Roc}... or should plot{T}(vec::AbstractVector{T}... do an interaction over the vector and calling plot!(elem::T) by default?

tbreloff commented 8 years ago

I expected a behaviour like this

I can look into adding this behavior, but I can't guarantee I'll follow through. I need to be sure that doing this doesn't break other, more common functionality. I may have a vector of values, points, series, or recipes... all of various (possibly user-defined) types, the underlying code needs to be generic enough to handle them all robustly and in the correct way. I say all this so that you understand the incredible complexity in providing such a simple API, so I need to think through the best way to enable it.

should plot{T}(vec::AbstractVector{T}... do an interaction over the vector and calling plot!(elem::T) by default?

This is likely a bad idea, especially since most plotting commands have keyword args which apply to multiple series.

Should it be implemented with

My preference would be that you do it yourself in the _apply_recipe method, so the signature could be something like:

function Plots._apply_recipe(d::Dict, rocs::ROCAnalysis.Roc...; kw...)
  ...
end

This has the added benefit that you can set the guide labels once, and you only have one diagonal line. Plus you could do a better job in laying out the AUC annotations all at once (or even make a single annotation).

Evizero commented 8 years ago

I would vote against automatic vector unrolling based on eltype. This seems like a fragile idea.

Here is an example of a recipe to plot a vector of losses that makes very different choices for when plotted in a single window vs as subplot: https://github.com/tbreloff/MLPlots.jl/blob/master/src/LearnBase/loss.jl#L24-L35