Hyperparameter optimization utilities

Hydrotoast commented 8 years ago

There are several hyperparameters for factorization machines that are crucial to convergence (at least in the tests I've been running) especially in SGD.

alpha the learning rate
initMean and initStd for the Gaussian distribution used in latent vector initialization

A simple candidate utility that is easy to implement would be Random Search proposed by Bengio. scikit-learn has a reference implementation that we may use.

Implementation Plan

Still a work in progress

Hyperparameter optimization utilities may live in src/hyperparams.jl.

There are two important use cases that we may implement as two separate functions

Analyzing the choice of hyperparameters and how it affects the evaluation
Building an optimal model with the best hyperparameters

We use the Distributions package to define parameter distributions.

using Distributions

param_distributions = Dict(:alpha => Uniform(0.01, 1.0), :initStd => Gamma(1.0, 1.0))
num_samples = 100
result = fm_randomsearch(X, y, param_distributions = param_distributions, num_samples = num_samples)

fm = result.model
info(result.param_scores)

And a corresponding implementation sketch:

function fm_randomsearch(X, y; param_distributions = Dict(), num_samples = 10)
    param_scores = fill((0,0), 10)
    best_score = typemax(Float)
    best_model = null
    for i in 1:num_samples
        sample = [rand(dist) for dist in values(param_distributions)]
        model = fmTrain(X, y; sample...)
        score = evaluate(fm, X, y)
        if score > best_score
            best_model, best_score = model, score
        end
        push!(param_scores, (sample, score))
    end
    RandomSearchResult(best_model, param_scores)
end

Hydrotoast commented 8 years ago

Another idea is to use macros to parse for loops and rewrite variables in the for-loop body.

@randomsearch for alpha = Uniform(0.01, 1.0), init_stddev = Gamma(1.0, 1.0)
    train(X, y, Methods.sgd(alpha = alpha, stddev = init_stddev)
end

This variable rewriting may be non-trivial.

btwardow commented 8 years ago

But the idea of random search is great for FM. And the syntax sugar with macros looks nice :-)

I'm rather skeptical of implementing things like grid/random search of hyper-parmas in particular libraries. However, I've just googled for available implementation in Julia and there is none! Or maybe I missed smth?

I never used random search for things like MF/FM. I just did grid search for all over the values to check. Those models are rather "fast learners" ;-) The random search was more useful for models with large number of hyper-parameters - like FFNN+RNN in my work.

Hydrotoast commented 8 years ago

I've found that the MLBase package has a single hyperparameter optimization procedure gridtune which is grid search. It seems that there is no single package that has dedicated much effort into this field.

It is an interesting insight that hyperparameter optimization would work better for models with a larger number of parameters. Perhaps it may be best to make this a low-priority task then.

btwardow commented 8 years ago

No, I think You misunderstood me. It's not like this that it works better. It's more like when you have simple model which learn fast and have only few hyper-params that are well defined and known - you just simply doing grid search to check your values. But if you have doubts or want to check over specific hyper-paramers distribution the random search by Bangio et al. method is still useful.

Hydrotoast commented 8 years ago

Thank you for the clarification. I will keep those use cases in mind.

btwardow / FactorizationMachines.jl

Hyperparameter optimization utilities #7