Strict Symmetry - Githubissues

jan-david-fischbach commented 1 year ago

@flaport suggested in #1 that a symmetry constraint can be put on the design by adding the transformed latent design with its mirror image. As far as I can see this only leads to an almost symmetric design (counterexample below). I believe this might arise from the fact, that the touches are selected one after the other. Any idea how to mitigate this effect?

flaport commented 1 year ago

since the result is almost symmetric you might be able to overlay the final result again with it's symmetric counterpart without creating too many violations.

Something like this:

latent_t = latent_t + latent_t.T
mask = generate_mask(latent_t, ...)
mask = mask + mask.T

Of course this might slightly break the strict fabrication constraints guaranteed by the algorithm but my guess is that the violations will be very minimal.

To ensure strict fabrication constraints and symmetry constraints we might have to adapt the algorithm to only look in a smaller latent space and immediately apply the same touch at its symmetric counterpart (for which we don't have a latent space). But such an adaptation will be more difficult and I think my above proposed method might be good enough.

flaport commented 1 year ago

Of course you could also apply symmetry after each generator step and then run the resolving pixels and such to produce a symmetric algorithm that way. That might also be good enough

jan-david-fischbach commented 1 year ago

The following also seems to work: If in the selection step there is no definite maximum (multiple maxima with (almost) the same value) select all. Tested here https://github.com/Jan-David-Black/inverse_design_strict_fabrication/tree/optimization_experiments in the local generator notebook.

lucasgrjn commented 1 year ago

There is one really weird thing I cant figure out: with the notebook11 of @flaport. I need to wait at least 10 iterations before the first change on the design..

jan-david-fischbach commented 1 year ago

@Dj1312 I also observed that behavior. It is highly dependend on the initialization of the latent space. Additionally the optimization step_size plays into it: with a larger step one reaches a change in the design mlre quickly.

jan-david-fischbach commented 1 year ago

This effect is the reason I implemented cache=True in javiche, so that the ceviche simulation is only performed once the design has changed. As a result a fast generator implementation is even more crucial. I am unsure, whether something similar was implemented in the original paper.

lucasgrjn commented 1 year ago

I want to check a couple of details with some Ceviche scripts there are a few things I find odd...

Another possible issue is the fact we dont make a complete convolution to obtain the reward array at the beginning. From the paper, part 2.B.: In practice, we greedily select the best touch as computed from a pixel reward array θ, where θ is obtained from a latent design produced in the course of an optimization, and is fixed during each run of the generator. The total reward of a solid touch is the sum of θ elements that would be set to solid were the touch to be made, i.e. the sum of θ elements set by the touch. The reward of a void touch is the negative sum of elements set to void by the touch. Just in case, they also gave the ADAM parameters: lr=0.01, beta1=0.667, beta2=0.9

Finally a comment about the beta-parameter of the transform, as mentionned, it is in the range of 2-8. Generally, we begin with the smallest value (2), then we make some iterations, the we increase it and so on. The idea is to push more "aggressively" (either to a volid or a solid) as the optimization moves towards a solution.

lucasgrjn commented 1 year ago

Just figured out we need to use the loss function (softplus and so on) defined on the paper to get a closer result. Indeed, for our case (the waveguide bend) this would put more weights on the s11 gradient. (I made a notebook which perfectly shown a gradient shape similar to the updated contour of Fig.5)

@Jan-David-Black , I have some problems with this now since the inner_loss_fn now take always 1 argument but return multiple s_values. (I will try to fix them and push the solution to the Javiche repo). @flaport When it works, I will push it using a PR.

jan-david-fischbach commented 1 year ago

I have some problems with this now since the inner_loss_fn now take always 1 argument but return multiple s_values

Could you return those as an array?

The original paper also does not seem to adhere to strict symmetry...

flaport commented 1 year ago

I added an attempt to include symmetry in the rust generator (it's in the symmetry branch). Currently it supports 'mirror' and 'transpose' symmetry. Using this method you indeed get a truly symmetric result.

I had some problems with larger brush sizes though... somehow I ran into an infinite loop for a (10,2) notched brush and I haven't figured out why yet.

But feel free to check it out and see if you run into any problems.

lucasgrjn commented 1 year ago

I have some problems with this now since the inner_loss_fn now take always 1 argument but return multiple s_values

Could you return those as an array? The original paper also does not seem to adhere to strict symmetry...

I think they dont enforce any symmetry for the first iteration (perhaps to avoid pushing the optimisation in a wrong direction).

My actual gradient looks like this:

I am actually pretty happy with the direction we are going :) I need to fix Javiche (maybe you can now simply rename it as JaxIt ?) to work with Jacobians, and we will see the next results.

jan-david-fischbach commented 1 year ago

Turns out reducing the bias and random spread already goes a long way to get the optimization going (without many iterations at the same design):

jan-david-fischbach commented 1 year ago

Finally a comment about the beta-parameter of the transform, as mentionned, it is in the range of 2-8. Generally, we begin with the smallest value (2), then we make some iterations, the we increase it and so on. The idea is to push more "aggressively" (either to a volid or a solid) as the optimization moves towards a solution.

Where did you find this?

flaport commented 1 year ago

Could we take the beta parameter as an optimizable parameter? Possibly bounded within a range by a sigmoid?

lucasgrjn commented 1 year ago

Ok, so!

By definition an hyperparameter is a parameter which had the purpose to "control the learning".

In the beginning of an optimization, you want your values to move the more freely possible. Hence, by taking beta=2, the curve of the tanh function remains smooth.
In the end of an optimization, you want your values to converge to a solution. Hence, a value beta=8 will now transform the tanh function to a kind of step (Heaviside) function.

You can view the tanh function as a kind of saturator: it moves/pushes the values close to an extremum (-1 for void /1 for solid in the paper.)

Yes, ofc, we can imagine to use the beta parameter as an optimizable parameter. But I am scared it wont be useful... (And in this case, it will no longer be a hyperparameter).

lucasgrjn commented 1 year ago

Turns out reducing the bias and random spread already goes a long way to get the optimization going (without many iterations at the same design):

@Jan-David-Black, could you share the new parameters you used? I would like to take a try and see if I get the same results!

jan-david-fischbach commented 1 year ago

https://github.com/Jan-David-Black/inverse_design_strict_fabrication/tree/optimization_experiments

lucasgrjn commented 1 year ago

Thanks! I understand we need to reduce the bias and random spread to be close to the "treshold". But I dont see how they converged so quick in the paper..

flaport / inverse_design

Strict Symmetry #10