How are category parameter values handled in Ax?

lena-kashtelyan commented 2 years ago

(Original question from @tramchamploo: https://github.com/facebook/Ax/issues/1048#issuecomment-1207750503)

How does ax handle category values in GP? Since ax doesn't support conditional parameter space, is it a good way to just omit 'invalid' param combinations in evaluation?

lena-kashtelyan commented 2 years ago

@tramchamploo, if I understand you correctly, you are looking to use Ax to optimize over a hierarchical search space. This is supported, albeit in alpha-mode; see an example here: https://github.com/facebook/Ax/issues/140#issuecomment-1009015750. Let us know if you have questions about the approach that are not covered in #140!

tramchamploo commented 2 years ago

~So only sobol is supported at this point? I want to use BO, is it fine for me to just cut useless branches by myself in the evaluation code? If so, why don't just implement it in Ax?~

I read through the thread and found it supported in BO. So what is this alpha mode? Don't see code that specify a "alpha" mode. Or is it just that Ax does it implicitly when using hierarchical search space?

lena-kashtelyan commented 2 years ago

Oh sorry, by "alpha mode" I just meant that it's new and pretty experimental functionality.

lena-kashtelyan commented 2 years ago

@tramchamploo, did I answer your question?

tramchamploo commented 2 years ago

@lena-kashtelyan Sorry for the late response. Actually I'm investigating the possibilty to migrate from Optuna to Ax. I found Ax offer great flexibility and highly abstract api which is great for experiment customization. But the dilemma here is that Optuna mainly focuses on TPE algrithms and I think Ax's best algrithm is ALEBO which is based on Facebook's own research. Do you have some performance comparision between these two? Especially when it comes to exploration vs expliotation, since my optimization seems to be trapped in local minima under TPE. Does ALEBO or Bandit optimization offer some improvement on problems with part of variables being discrete?

lena-kashtelyan commented 2 years ago

cc @dme65 for the methods question

lena-kashtelyan commented 2 years ago

I found Ax offer great flexibility and highly abstract api which is great for experiment customization.

So very glad to hear!!

Balandat commented 2 years ago

I think Ax's best algrithm is ALEBO

"best" in what way? Different algorithms will work well on different problems (ALEBO is targeted to very high-dimensional search spaces). What kinds of problems are you trying to solve? How many parameters are in the search space? How many of these are continuous, how many discrete (ordinal or categorical)? How many evaluations can you do in a typical problem? Are you doing single-objective optimization or constrained or multi-objective optimization?

tramchamploo commented 2 years ago

I have 20 parameters in my search space. About 5 are discrete and others are continuous. I can do 10000 evaluations and I'm doing a constrained single-objective optimization. But I can move these constaints "into" my evaluations so that algrithms only see the unconstrained problem.

Balandat commented 2 years ago

So 10K evaluations are a lot - that’s above the typical range that Ax was designed for and you’ll likely run into scalability issues if you’re using our standard models. How long does one evaluation take?

That said we’re also planning to open source a version of TurBO which would work better and scale more gracefully on this problem - cc @dme65.

tramchamploo commented 2 years ago

It just takes several dozens of millisseconds.

saitcakmak commented 2 years ago

It just takes several dozens of millisseconds.

It sounds like your evaluations are too cheap to warrant the use of most BO algorithms. Training the model and optimizing the acquisition function will take longer than that (a few seconds or longer depending on the setting). Using an evolutionary algorithm might be a better choice here.

tramchamploo commented 2 years ago

What if I don't care about time spent? What's the comparison between TPE and BO performance-wise? I do 10k evaluations just because I'm able to. I don't have to do as many if algrithm converges fast. And the property I'm concentrated mostly on is exploration vs exploitation cause TPE always stuck in local minima in my experiment.

lena-kashtelyan commented 2 years ago

Hi @tramchamploo, sorry for delay on this one. We haven’t explicitly compared TPE and BO, but if you try it out on your problem, let us know what you get!

We do know that generally the two algorithms are best fit for different problems –– if you're not at all restricted in the number of points you can try and wanted to try out as many as you can (and therefore use TPE as BO will get slow/memory-intensive with very many points). However, if you are interested in reducing the number of trials you run or just in a different algo since TPE is getting stuck for your problem, BO is a better fit.

Among our BO offerings, TuRBO is friendly to large number of trials and would likely be a very good fit for your problem, as @Balandat points above. It is unfortunately not yet available in Ax open-source, but here is a BoTorch tutorial for it if you wanted to try that out: https://botorch.org/tutorials/turbo_1.

Let us know how this sounds, and we're very curious to hear how our BO or TuRBO ends up working out for your problem!

lena-kashtelyan commented 2 years ago

I'll leave this one as "needs repro or more info" for now –– it's either resolved with my response above or we'll hear back from you at some point @tramchamploo : )

lena-kashtelyan commented 1 year ago

Closing as inactive, please feel free to reopen.

facebook / Ax

How are category parameter values handled in Ax? #1051