Experiments with Synthetic functions

mchsyu commented 6 years ago

Could you give some guide lines of the whole section? For instance, is the sparse vector "s_i,j" in this section the same as the one in the proof of Theorem 6 which is a "s-sparse vector x" ? And why does it contain 5 pairs of weight and feature?

callowbird commented 6 years ago

Which "s_i,j" are you referring to?

Based on our theorem, in order to approximate a decision tree of size s, one needs to learn a polynomial of sparsity s^2/epsilon. In practice, we just pick 5 for every stage; but it might differ based on your application (i.e., the size of the decision tree that is approximating the hyperparameter space).

mchsyu commented 6 years ago

The first sentence of the 2nd paragraph in Sec. 5.4, "... in i-th stage (i = 0; 1; 2), it has 32^i sparse vectors s_i,j for j = 0, · · · , 32^i − 1."

The answer you gave above means that the Fourier coefficients alpha (in Fig. 1) have at most s^2/epsilon nonzero items and each is at least epsilon/s large (Fact 4), doesn't it? So I know you pick 5 for each stage for convenience.

So, if s_i,j aforementioned means the learned sparse Fourier coefficients (by Lasso), how come it has 32^i different vectors? Is there any reference here? Or, did I misunderstand about it?

callowbird commented 6 years ago

That's a good question! So Section 5.4 is a synthetic experiment, which is minimizing an arbitrary function that was defined by ourselves. The main feature of this function is it has a hierarchical structure, so that if you find a good sparse function in the first stage, you are going to do well in later stages. But this is just one simple example, and please don't worry too much about it.

Harmonica could be used for tuning any problems, but may not be as good as other tools (e.g., spearmint) in all cases. Our paper showed that if your problem is approximately a tree, then Harmonica has nice sample complexity and approximation guarantees. That is, Harmonica will do well.

I guess your question is, in the synthetic experiment, the function is dense, not sparse, not meeting our assumptions. So we only showed that if the problem is approximately a tree, Harmonica is good; but we did not say, if the problem is not approximately a tree (e.g., synthetic experiment), Harmonica is not good. In fact, in some cases it could be good as well. Sorry for the confusion!

mchsyu commented 6 years ago

I get it. Thank you very much.

My another question is that if we let f_i (line 5) in Algorithm 3 be the same setting in the paper (the expected restriction of f according to the minimizers M_i), how can we do the expectation of a "neural network" with different hyperparameter configurations, and then use this "single" expected function to do another PSR (Procedure 2) for the latter stage?

I may guess that you can just keep t neural networks, each with a different hyperparameter, and use them as a whole in the latter stage. Am I right?

callowbird commented 6 years ago

That's another nice question :) You just need to keep t configurations, and for the next stage, you first sample one of the t configurations, and then do uniform random sampling for the remaining hyperparameters. At the end of the stage, you simply run Harmonica on the remaining hyperparameters, without thinking about the t configurations.

By doing that, it's essentially the same as your idea of keeping t neural networks, but you don't need to create the actual network.

mchsyu commented 6 years ago

This seems that to just use the best hyperparameter configuration would be good enough, right? I might write an e-mail to you about a visit to Taiwan, if you are interested in it.

callowbird commented 6 years ago

Yes, setting t=1 would be good enough, and we just want to make it more general. I would be happy to know more details about the visit! Please send to callowbird@gmail.com

callowbird / Harmonica

Experiments with Synthetic functions #1