facebookresearch / nevergrad

A Python toolbox for performing gradient-free optimization
https://facebookresearch.github.io/nevergrad/
MIT License
3.94k stars 352 forks source link

What are the best practices for circular parameters? #759

Open JackMaguire opened 4 years ago

JackMaguire commented 4 years ago

First off, thanks for making Nevergrad open to the public. This is a great resource and my team loves using it to model and design proteins. The API is incredibly easy to use for us novices.

I am curious about the best way to represent things like angles. I can imagine bounding them at 0-360 degrees but I worry about the edge effects. The distance between 2 degrees and 358 degrees would be calculated to be 356 instead of 4. Likewise, an agent/particle at 359 would not be able to roll over to 0.

My team is considering splitting our angles into two parameters (x, y) that are each bounded (-1,1) and interpreting the angle as arctan( y / x ) with some additional quadrant logic. But before we do that, do you have any recommendations of better ways to handle this case?

jrapin commented 4 years ago

Hi!

First off, thanks for making Nevergrad open to the public. This is a great resource and my team loves using it to model and design proteins. The API is incredibly easy to use for us novices.

Thank you for your feedback! If you can, don't hesitate to share what has been working for you, and what has not, that's helpful information for us ;)

I am curious about the best way to represent things like angles.

In that case my initial thought would be to use an unbounded variable, then remove the redundant part ( 372 -> 12) within your function if that's not too complicated. This way it's transparent for the optimizer, the function becomes periodic and most optimizers should be able to deal with it. This is just my intuition though, and Olivier may have more insights and knowledge about this (but may take a few days to answer because he is currently very busy). @teytaud wdyt?

JackMaguire commented 4 years ago

Thanks for the quick feedback!

the function becomes periodic and most optimizers should be able to deal with it.

That's very useful for me to know. I was worried about leaving it unbounded because I would worry that the optimizers would waste time sampling redundant space.

For context, we're finding that RealSpacePSO and DoubleFastGADiscreteOnePlusOne work best for us. I can't picture how RealSpacePSO handles periodicity but I'm a complete novice so there are tons of things I can't picture.

That said, these are the two optimizers that work best with the bounds. Are there any particular optimizers that work very well with periodic parameters?

jrapin commented 4 years ago

I can't picture how RealSpacePSO handles periodicity but I'm a complete novice so there are tons of things I can't picture.

I'd say I'm a novice too, but I wouldn't be worrying too much about PSO in these settings, it's pretty robust to local minima, so periodicity should not matter too much. I was also considering the option of implementing a transform which would do the "modulo mapping". That would "reset" the value to be always between 0 and 360. However in that case PSO could act in a really weird way, because there is some kind of inertia which could go completely wrong on the borders. So the safest bet for PSO in my opinion is just to do unconstrained optimization and reset to 0-360 within your function. But I may be completely wrong, it's always worth trying many options!

DoubleFastGADiscreteOnePlusOne

I never had a very close look into this one, @teytaud knows more about it.

That said, these are the two optimizers that work best with the bounds. Are there any particular optimizers that work very well with periodic parameters?

I have really no idea about this, I don't think we even have such a testbed in our test suit, @teytaud any insights?

teytaud commented 4 years ago

Quite interesting point! Maybe we should implement toroidal parameters. Bounded parameters, but in the mutation, instead of mirroring, we apply a modulo. Should be short to implement... Jeremy, a bit of guidance from you about where I should code this and I do it ? For crossover operators it's complicated, for sure... but if I understand correctly the best algorithms you have do not use crosssover operators.

teytaud commented 4 years ago

(DoubleFastGA is our adaptation to multivalued discrete variables (posssibly infinitely many valued...) of the FastGA by Doerr et al, also close to PortfolioDiscreteOnePlusOne)

jrapin commented 4 years ago

Should be short to implement... Jeremy, a bit of guidance from you about where I should code this and I do it ?

We must talk this through before implementing anything, because I see two options for having something like this, but they are very different in term of both implementation and result:

JackMaguire commented 4 years ago

Thank you both for your insight! I ran my benchmark with bounding the angles from [-225 degrees, 225 degrees]* (blue) and with leaving them unbounded (orange). It does not seem that one is better than the other. Please let me know if you add a feature for this use case and I will be sure to test it!

*I thought adding redundant padding on each end might reduce edge effects, hence [-225, 225] instead of [-180, 180]

RealSpace PSO

image

N=17

DoubleFast

image

N=75

teytaud commented 4 years ago

I come back on this. Implementing something in Transform is complicated, for the reasons you have pointed out, @jrapin . I see three possibilities: (a) implementing more sophisticated transforms, possibly embedding an operator for weighted average. I think this solves everything. (b) an angle is defined by 4 unbounded variables (yes, four variables :-) ). x1,x2,x3,x4 represents a point (a,b)=r cos(theta) with (a,b) the weighted average of (1,1), (-1,-1), (1,-1), (-1,1) weighted by exp(x1), exp(x2), exp(x3), exp(x4). (c) more hackish: we keep the same transform structure, but with a memory. The representation of an angle, for example, is between last_angle-180 and last_angle+180, so that averages can make sense if the two transformation operators are called next to each other... dirty but could work :-)

Is it a problem if a Transform maps a ndarray of length x to a ndarray of length 4*x ?

teytaud commented 4 years ago

very draft for the moment: https://github.com/facebookresearch/nevergrad/pull/779/files

JackMaguire commented 4 years ago

Hi @teytaud. Is the draft from your previous comment in a useable state? I'm getting error messages when I try to use it and I'm wondering if I'm jumping the gun.

JackMaguire commented 3 years ago

Hi all, I just wanted to check in to see if there was still any interest in this. I'm happy to help in any way possible.

jrapin commented 3 years ago

Hi all, I just wanted to check in to see if there was still any interest in this. I'm happy to help in any way possible.

Hi, I've recently merged an (undocumented) update that lets you write a "modulo Scalar", param = ng.p.Scalar() % 10 would basically be a scalar modulo 10. It's still not the best option as for instance the average of 1 and 9 with this type of variable would still be 5 while you probably would want 0. In any case, this update was made possible by a new "layer" system that performs operation on the underlying data #1045. I expect @teytaud proposition in #779 should be rewritten under this format to be more maintanable, but I have not had time to try and understand the equations, nor investigate if this would be better :s If you want to give it a go, feel free ;)

JackMaguire commented 3 years ago

Thanks for the update, Jeremy! I'll give modulo a shot. Maybe not with PSO (for the reason you stated) but DoubleFastGA should work I think

jrapin commented 3 years ago

Hi @JackMaguire I've revamped @teytaud's proposition and added ng.p.Angles function that creates an Array representing angles. This is using the same very recent refactoring as the modulo. Not too sure yet what should be the API (ng.p.Angles is not actually a class, I've used a capitalized letter to make it look similar as the rest of the API, but it's just a function, so it should probably be uncapitalized, so expect a few changes until we converge somewhere :s ). If you have time to try it out, let me know, we don't have actual test sets to check if this is helpful or not, but that's the last possible best practice we can provide ;)

JackMaguire commented 3 years ago

Hi @jrapin Thanks so much for adding this! I'll try it out on my test set this week and will let you know how it goes. Even if the results are comparable I'm still looking forward to having this more formal representation to help me feel comfortable that my code works. I'll reply again soon after my benchmark comes back.

JackMaguire commented 3 years ago

In terms of API, maybe something with the units would be useful? Like np.p.degrees has ng.p.Array() % 360 and np.p.radians has ng.p.Array() % 2*PI?

JackMaguire commented 3 years ago

Oddly I'm actually seeing slightly worse results with modulo with DoubleFastGADiscreteOnePlusOne, though the results are still preliminary and noisy. I'm comparing ng.p.Array( shape=(20,) ) % 4 vs ng.p.Array( shape=(20,) ).set_bounds( -2.5, 2.5 ) and the modulo option is giving me a loss that is only 92% as good as the bounded range's results.

(I multiply the values by 90 to get the full 360 degrees, though the second option has some extra buffer to prevent edge effects)

I'll play with it some more to see if it's consistent

jrapin commented 3 years ago

In terms of API, maybe something with the units would be useful? Like np.p.degrees has ng.p.Array() % 360 and np.p.radians has ng.p.Array() % 2*PI?

I'm a bit reluctant to do this, because that can hide a lot of details. Eg: mutation by default will be 1, and 1 radian or 1 degree is definitely not equivalent :s That being said, I've just merged #1078 and added a parameter deg which can be set to true for degrees from -180 to 180 instead of gradients (same behavior as in np.angles). Now you can actually just do ng.p.Angles(shape, deg=True) + 180 to get it between 0 and 360. I've also a bit simplified it since the last time I mentioned it, may be worth a go if you have time.

the modulo option is giving me a loss that is only 92% as good as the bounded range's results

Hard to know how parametrization interacts with optimization, hence the complete decoupling so far although that can be messy :(. It may still be worth checking PSO or other optimizers with the modulo as well, because in practice the pool of particles move together and I doubt the averaging issue will be a big problem. And of course, may be worth checking with all of them with the new ng.p.Angles option. Again, maybe it's just no better :s

JackMaguire commented 3 years ago

Thanks for the update and insight, Jeremy!

Again, maybe it's just no better :s

From a user point of view, it's still a very useful API even if the results are the same.

It may still be worth checking PSO or other optimizers with the modulo as well

Good idea, I'll give it a shot