chengsoonong / acton

Active Learning: Predictors, Recommenders and Labellers
BSD 3-Clause "New" or "Revised" License
20 stars 5 forks source link

Integrate Hyperopt #96

Open chengsoonong opened 7 years ago

chengsoonong commented 7 years ago

http://hyperopt.github.io/hyperopt/

nbgl commented 7 years ago

I would like to implement this for active learning, but I am having trouble seeing how Bayesian optimisation is equivalent to active learning. I found this paper (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.724.7020) and have started making my way through it.

MatthewJA commented 6 years ago

Perhaps this could be achieved by subclassing Recommender and calling one step of hyperopt.

nbgl commented 6 years ago

I still can't think of how to phrase active learning as a Bayesian Optimisation problem… Have you thought about this, @MatthewJA?

MatthewJA commented 6 years ago

I think you would optimise over the feature space, and then do a nearest-neighbour lookup to the optimum point.

nbgl commented 6 years ago

That makes sense. What's your objective function though?

MatthewJA commented 6 years ago

(This is very similar to earlier active learning strategies that just looked to argmax over the feature space instead of over the input data points.)

Good question... maybe expected model change?

MatthewJA commented 6 years ago

In the end you want to make your model maximally good. Maybe directly optimising a loss like that against a validation set is a good idea.

nbgl commented 6 years ago

Can you optimise the loss without knowing the label?

Also, optimising the expected model change sounds reasonable. If you have a finite set of unlabelled examples, then Bayesian optimisation becomes just a heuristic to avoid having to evaluate the function for every unlabelled example. It makes a heap of sense if you can make up any example and give it to an oracle, since it's hard to optimise these objective functions without using Bayesian optimisation.

MatthewJA commented 6 years ago

You could optimise a proxy to the loss (e.g. loss on labelled set). It was not uncommon a while back to consider scenarios where you generated queries rather than sampled them, but I'm having trouble finding a good reference for that.

nbgl commented 6 years ago

My question was more: you're trying to pick an example to label. How do you know how it will affect the loss without actually knowing this label?

MatthewJA commented 6 years ago

You have a probability from your predictor! Weight your results by that.