ablaom commented 5 years ago

Would be nice to have some visualization tools (Plots.jl recipes?) for looking at the results of one or two-parameter tuning, as ouput in the report field of a TunedModel machine.

This might be as simple as adding a suitable instruction to the tuning doc string on how to call an existing recipe.

[x] two-parameter case
[ ] one-parameter case (to include uncertainty of performance measurements)

tlienart commented 5 years ago

Do you think using UnicodePlots may make some sense here so that one could keep working in a REPL? It's also much faster and for simple visualisations it may be enough?

In a similar vein there's PrettyTables.jl a lightweight package to quickly visualise dataframes and highlight elements which might be good too (for quickly showing benchmark results for instance, highlighting the best scores according to a range of metrics)

Possibly the visualisation could be inferred from the environment so that if one is in Juno or IJulia, richer multimedia tools are used?

datnamer commented 5 years ago

UnicodePlots is also a plots.jl backend. I would recommend having plot recipes for plots.jl and then eventually makie. Then the user can choose the backend

Edit: Maybe @mkborregaard can discuss his experiences with statsplots and how they might help here.

ablaom commented 5 years ago

All sound good to me. I'm leaving this one for someone else to implement!

mkborregaard commented 5 years ago

I agree that recipes would be a good choice. There are recipes in MCMCChain that I helped develop https://github.com/TuringLang/MCMCChain.jl/blob/master/src/plot.jl which might be similar to what is required here? I wouldn't mind giving a hand or contributing something if there's a clear description of what you'd like the plotting function to do, show and dispatch on exactly.

fkiraly commented 5 years ago

Regarding "tuning plots", it might be best to dispatch the plot method on the fitted model?

Though I see issues with what the default would do:

what should the plot do if there are more than two parameters? Select two? Aggregate? Marginalize?
even if there are only one or two, they could be discrete, ordered, or continuous. A heatmap, or histogram (grid point vs training error), is currently all I can come up with that would fit all of these (ordering unordered factor levels in an arbitrary way)

One thing that the user also expects are "learning curves", though not all models may return these. Maybe separate the grid tuning plot from the learning curves?

ablaom commented 5 years ago

I think that in MLR uses bubbles on the grid points (for 2d case). The bubble radius reflects the performance estimate. Maybe that's a better fallback??

@mkborregaard That would be awesome if you could contribute.

Some technical detail for implementers:

Note that hyperparameters in MLJ are generally nested. See MLJ/README.md or MLJ/doc/tour.ipynb for details.
The plotting (for tuning) would be dispatched on objects of type Machine{TunedModel}. After an object mach of this type has been fit (isdefined(mach, fit result) is true), one may extract the tuning stats from a dictionary mach.report, assuming model.report_measurements was true before the fit call.

What you'll want is:

report[:parameter_names] - a row vector of strings of names (with dot-concatenation for nested parameters), eg, ["elastic.alpha", "knn.K"].
report[:parameter_values] - a matrix of the parameter values, one row for each performance estimate (called a measurement).
report[:measurements] - a column vector of performance estimates.

mkborregaard commented 5 years ago

Sounds good. Could you provide an example of the kind of end result plot you have in mind?

ablaom commented 5 years ago

Here is the aforementioned example of a bubble plot from MLR but I'm not fixed on this if others want to chime in with a different suggestion. Since performance estimates are generally close together, a proportional bubble size obviously doesn't work (ie some recentring/rescaling is called for).

By default, add the following annotations (in title?): the value of mach.measure (eg, auc), the value of typeof(mach.tuning) (the tuning strategy, eg, Grid) and the value of mach.resampling (the resampling strategy (eg, CV(nfolds=6)); typeof(mach.resampling) (eg, CV) would also do.

tuningbubbles

fkiraly commented 5 years ago

Looks good, just a small note: aren't the axes mislabelled? Since should C/sigma in SVM not always be positive? These are probably the untransformed values, or log(C) and log(sigma) for some logarithm basis. The axes should say so, or display the correct values for the paramaters.

Is this a bug in mlr?

ablaom commented 5 years ago

Dunno. Just copied and pasted from MLR slides

fkiraly commented 5 years ago

Hm, also, shouldn't "area under the (ROC) curve" be at least 0.5? Something is fishy in this plot...

Anyway, I personally like the heatplot more for this purpose, but that's really just a matter of taste.

mkborregaard commented 5 years ago

OK, so what's the suggestion when using a non-linear scale? Here's some work with the tuned_ensemble model from the readme: scatter: skaermbillede 2019-02-26 kl 09 37 16 As you can see, lots of overlapping points. Also, the points have very similar sizes (and IMHO it's not appropriate to rescale when using sizes, as our eyes intuitively compare with 0). You could use color as well: Heatmap gives more weight to the color and fills in: skaermbillede 2019-02-26 kl 09 37 42 You could adjust the grid to be even if preferred:

fkiraly commented 5 years ago

hm, would it look better if you:

plot by log(atom.K)
use different sized bubbles ?

mkborregaard commented 5 years ago

But if it were to be general, would that require examining the object somehow to see that one axis was log spaced, then scale that? And I do use different sized bubbles. The values are just really similar, and as I said it's not really fair to transform by the smallest size in terms of honest visualization.

mkborregaard commented 5 years ago

Also, for some reason the grid looks weird when log transforming: skaermbillede 2019-02-26 kl 12 34 53

tlienart commented 5 years ago

That's nice! Just chipping in but it seems to me quite common that models can end up having very similar performances for a range of hyperparameter especially trivial models like KNN so it's probably fair to just return a flat landscape where that's the case. That being it seems to me KNN will in general only vary for very low K (so from K=1 to K=10 maybe) so maybe the range shown here on plots is a bit too big?

Finally I feel the first heatmap is the visualisation that is the clearest out of the ones posted here but maybe just personal preference.

mkborregaard commented 5 years ago

(not a response, just the notion that a surface plot might also be intuitive here) skaermbillede 2019-02-26 kl 13 29 59

tlienart commented 5 years ago

IMO 3D plots look cool but are in general hard to read. Also with the idea that we could potentially have MLJ revert to UnicodePlots backend in a "fully-REPL-mode", 3D would not be the best option I think?

mkborregaard commented 5 years ago

I agree with your first point in principle, but not, in fact, for fitting surfaces. Anyway, the heatmap recipe would allow the user to default to surface whenever appropriate.

fkiraly commented 5 years ago

This might highlight the challenge of determining proper axis ranges (or dot size ranges). Scale at min/max plusminus epsilon might be necessary.

ablaom commented 5 years ago

The scale used is part of the information accessible to you from the TunedModel object. Because of the nested nature of the hyperparameters, it is not in the most convenient form for 2D plotting but I will work on this over the next day or so.

I will arrange to have the scales output as report[:parameter_scales] (see technical notes above).

ablaom commented 5 years ago

Okay, scales are now available; see issue #92 for details and example.

mkborregaard commented 5 years ago

And did you prefer the heatmap or the dots? Which of the above appeals?

ablaom commented 5 years ago

@mkborregaard Marvellous work, by the way.

I guess I prefer the bubbles as a default because you can see where the samples were actually taken and, at a glance, what the resolution was, and so forth. Also a bubble size is immediately understood - a colour needs to be interpreted. Despite your concern for "honest" reproduction, I would rescale as others have also suggested. How else can I distinguish estimates that are typically very close together? A really honest plot would have an indication of uncertainty of estimate at each point (available in principle in cv case but not holdout) but I think these plots serve more of a diagnostic purpose than a final reporting one, no?

That said, I do not have a strong preference and you are never going to please everyone. As long as I can quickly see where the low or unusual points are, I am happy.

Side issue: How would we present a a 2D parameter plot using UnicodePlots? I would love the option of a REPL plot without having to load the Plots.jl frontend first (regular pain!!).

BTW: The log plot spacing is not uniform in your bubble plot above because K is integer valued. So rounding forces non-uniform spacing on the log scale. So, this is just the way it is; nothing wrong with using the log scale there (which was the scale used to generate the grid).

fkiraly commented 5 years ago

Hm, plot-wise I think heatmap is the clearest, with surface plot also not bad - though more difficult to read and objectively compare z-values. Though I personally prefer to look at the matrices (with numbers) rather than any plot.

@ablaom , regarding uncertainty estimate - absolutely agreed. But it is unclear how to show it in a 2D heatmap or a 3D plot - uncertainty envelopes/tunnels are however good to show in an 1D plot performance-vs-parameter_value (if there's one parameter only). Which brings us to the topic how to compute confidence intervals/Regions, one of my favourite topics that you should not have gotten me started on. Spoiler: I don't think using cross-validation re-samples gives good confidence intervals.

ablaom commented 5 years ago

@fkiraly

Though I personally prefer to look at the matrices (with numbers) rather than any plot."

Me too!

Spoiler: I don't think using cross-validation re-samples gives good confidence intervals.

Look forward to discussion at meeting next week.

ablaom commented 5 years ago

@mkborregaard Please note that reports have just become named tuples. So accesss is by property not index. See NEWS.md.

94

mkborregaard commented 5 years ago

I've got to say I'm spending almost all of the time here searching for a way to generate the object you want plotted, rather than with coming up with a recipe. The old one I had generated from your readme doesn't work anymore, and the tuned model generated on your tour doesn't seem to be of them type you're interested in. There's an example over in #92, but it's missing something called sel. I'll give up on this now, but check back later next week if you post an example.

fkiraly commented 5 years ago

@mkborregaard thanks for the feedback - this might be taken to mean that the interface for inspecting fitted models may have to be improved, in general - see ongoing #51 discussion.

Thus, it might make sense to assume you have the data already in a nice format, and wait for the interface extension to give the objects nicely to you, rather than find the best way to pry it from the learning machine's cold, dead hands.

Unless, of course, @ablaom recommends another way to proceed.

ablaom commented 5 years ago

@mkborregaard

I'm sorry to hear about your frustrations. The main issue I believe is the unlucky breaking change in the format of report (from dictionary to named tuple) and the incompleteness of the example posted at #51 (which the change immediately made redundant) for which I apologise.

and the tuned model generated on your tour doesn't seem to be of them type you're interested in.

I'm interested in this example. I have posted a distilled, complete, and tested version of the tour example, showing how to get everything I expect you need. You need to update your MLJ installation (including MLJBase and MLJModels) to the version posted about 17 hours ago to get it to work but let me know if you have problems.

I doubt very much there will be further API changes in the foreseeable future that will break code based on this example. Let me know if and when you decide to have another go and thanks again for the work so far.

Your method (adapted into recipe) will look something like

function plot(mach::MLJ.Machine{<:MLJ.EitherTunedModel}) 
    r = report(mach)
    xlab, ylab = r.parameter_names
    xscale, yscale = r.parameter_scales
    x = r.parameter_values[:,1]
    y = r.parameter_values[:,2]
    z = r.measurements
    <code to generate plot>
end

EitherTunedModel is an alias for Union{DeterministicTunedModel,ProbabilisticTunedModel}.

mkborregaard commented 5 years ago

Thanks for making it so easy for me now - I'll post the recipe as soon as I have a moment :-)

mkborregaard commented 5 years ago

99

baggepinnen commented 4 years ago

I find it quite useful to visualize hyper-parameter tuning for more parameters than 2 as well. Simply plotting ranges vs function values for all parameters is a reasonable way of presenting the information. You can't determine interaction between parameters from this, but you can see overall trends for individual parameters and it quickly becomes apparent if one parameter is much more important than the others example: https://github.com/baggepinnen/Hyperopt.jl/blob/master/figs/ho.svg

mkborregaard commented 4 years ago

@baggepinnen 's plot looks nice.

Should this issue have been closed though?

ablaom commented 4 years ago

closed in favour of #416

JuliaAI / MLJ.jl

Visualization tool for one and two-parameter tuning #85

94

99