Does the "actual value" of a discrete hyperparemeter matter?

HIPS / Spearmint

Spearmint Bayesian optimization codebase

Other

1.55k stars 328 forks source link

Does the "actual value" of a discrete hyperparemeter matter? #86

Closed cuihenggang closed 7 years ago

cuihenggang commented 7 years ago

Hi, I'm trying to use Spearmint to optimize my hyperparameters for my machine learning task. I have one hyperparameter variable (among many other hyperparameters, say it's X) that stands for the type of my model (SVM, logistic regression, etc.). For example, if the X==0, it's SVM, and if X==1, it's logistic regression, etc.

However, I find if I change the range of X, to make it start from 1, (X==1 means SVM, and X==2 means logistic regression), the behavior of Spearmint becomes quite different (i.e., it proposes quite different hyperparameter choices for me to try).

Intuitively, the actual value range of X shouldn't matter. So I'm wondering is what I see expected, or I did something wrong?

Thank you!

mgelbart commented 7 years ago

You are correct in that it should not matter if you change from 0 to N or 1 to N+1. I'd like to look into this. Can you provide some more details? For example, can you confirm that you're using the latest commit of master branch? Is it easy to provide your actual function that you're optimizing?

Note that it might matter if you change the order of discrete hyperparameters though. But you only shifted them, right?

cuihenggang commented 7 years ago

Yeah, I only shifted the hyperparameter values. Actually I find I might have overstated (I'm sorry) how different the behavior is with the shifted hyperparameters. Maybe the changed behavior is because of the noise (noisy nondeterministic function values). Thank you for offering to look into that.

I'm glad the shifting doesn't matter. Also, just out of curiosity, does the scaling matter? Suppose I have a function f(x), and I change it to g(x)=f(10*x). Will the behavior of Spearmint be different?

mgelbart commented 7 years ago

No, the behaviour should not change under the transformation g(x)=f(10x).

cuihenggang commented 7 years ago

Thank you for the explanation. I think this issue can be safely closed. Thank you!