Loss function - Githubissues

jan-david-fischbach commented 1 year ago

Hey @flaport, @Dj1312, I am currently trying to get the loss function right. Unfortunately, my convergence behavior is still very different from the paper and I suspect the culprit lies here:

s11 = jnp.abs(s_params[:, 0, 0])**2
s21 = jnp.abs(s_params[:, 0, 1])**2

s = jnp.stack((s11,s21))
g = jnp.stack((jnp.ones_like(s11),-jnp.ones_like(s21)))

t_s21 = 10**(-0.5/20)
t_s11 = 10**(-20/20)

target = jnp.stack((jnp.ones_like(s11)*(t_s11**2),jnp.ones_like(s21)*(t_s21**2)))
w_min = min(1-t_s21, t_s11)
L = jnp.sum( jax.nn.softplus(g*(s-target)/w_min)**2 )

lucasgrjn commented 1 year ago

The culprit is defintely here, this is what I pointed on #10. I am also investigating by testing on notebooks to find where is the issue to solve this.

If you take a look at your implementation, it is not the exact one. It should be:

L = jnp.linalg.norm(jax.nn.softplus(g*(s-target)/w_min)**2)**2

jan-david-fischbach commented 1 year ago

norm without axis or ord returns the two-norm, correct? so the inner **2 should already be contained in the norm? In my case I have the inner **2 because I calculate the norm manually. Therefore I think the equation and my code do the same thing, am I wrong?

lucasgrjn commented 1 year ago

Your equation is not wrong! (as you pointed you directly cancelled the square-root contained in the L2-norm) However, I have no idea if this additional functions can add some small errors. And if this small errors can push us closer to the "paper solution".

Disclaimer: I don't think it will... Hence, I think your solution is better!

jan-david-fischbach commented 1 year ago

I tried the following: Running the optimization loop including the loss from above for the modeconverter, but ignoring the generator.

When comparing that to the paper:

I have a hard time believing that the loss is reduced more strongly including the fabrication constraints than without. What do you think @flaport, @Dj1312?

flaport commented 1 year ago

Hmm interesting... I indeed expect the loss with fabrication constraints to be a lot higher. Maybe it's a dumb normalization factor or so?

jan-david-fischbach commented 1 year ago

I thought about a **2

jan-david-fischbach commented 1 year ago

I have been quite desperately trying to recreate the graph with the generator. But it is always much worse in terms of normalized loss :/

jan-david-fischbach commented 1 year ago

Ah, it might also be related to the wavelength bands. I had disabled those in favor of a single wavelength to speed up the simulation. I'll try the unconstrained optimization with bands to check...

lucasgrjn commented 1 year ago

@Jan-David-Black I think there is a subtitly! You use a definition with the conversion of the dB using a factor 10. But in our case the s_params are defined as a ratio of power, a factor 10 will be more relevant.

I plotted the two graphs on my notebook without the generator (as you pointed) using t_sij = 10**(-x/20) and t_sij = 10**(-x/10) on the losses function. And so, the loss seems to be more reduced than with the fabrication constraints! :)

PS: I use the following parameters to generate the initial latent bias=0.95, r=1, r_scale=1e-3

jan-david-fischbach commented 1 year ago

@Jan-David-Black I think there is a subtitly! You use a definition with the conversion of the dB using a factor 10 (I think you meant 20 here?). But in our case the s_params are defined as a ratio of power, a factor 10 will be more relevant.

I plotted the two graphs on my notebook without the generator (as you pointed) using t_sij = 10**(-x/20) and t_sij = 10**(-x/10) on the losses function. And so, the loss seems to be more reduced than with the fabrication constraints! :)

PS: I use the following parameters to generate the initial latent bias=0.95, r=1, r_scale=1e-3

Hm, but we do square the target s-params to get their "power-representation" in this (somewhat ugly) line:

target = jnp.stack((jnp.ones_like(s11)*(t_s11**2),jnp.ones_like(s21)*(t_s21**2)))

So dividing by 20 should be correct, no?

Or am I completely missing the point and ceviche-challenges returns power s_params? That would be quite unconventional, no?

lucasgrjn commented 1 year ago

Or am I completely missing the point and ceviche-challenges returns power s_params? That would be quite unconventional, no?

We want to maximize the power going trough the output compared to the reflected power. If you go take a look at the ceviche-challenges source code, the definition of the s_params is made using the overlap between an E and a H field, so a form of power. (Disclaimer: I am not really familiar to S-parameters in electronic circuits.)

But for my understanding, if we work only with a simple field, we would use the 20 factor. In our case, I would tend for a 10 factor.

But as you, TBH, if I take a look at the equation and the square, I am more doubtful.

The only thing making me tend for the 10 (power factor) is the following sentence: For example, a minimum transmission amplitude cutoff of 0.5 (-3 dB in power transmission) would have a valid of 0.5, plus the fact Tab. I of the article gives the power scattering parameters.

jan-david-fischbach commented 1 year ago

For example, a minimum transmission amplitude cutoff of 0.5 (-3 dB in power transmission) would have a valid of 0.5, plus the fact Tab. I of the article gives the power scattering parameters.

Well maybe I just made the wrong assumptions. I am going to have a deeper look

jan-david-fischbach commented 1 year ago

Here at least they use 20: https://github.com/google/ceviche-challenges/blob/6352656f902dabacea88e123c89dde13dd8a3160/ceviche_challenges/scattering_test.py#L43-L44

lucasgrjn commented 1 year ago

Yes, I agree it is confusing..

ianwilliamson commented 1 year ago

The scattering parameter arrays that enter into this loss function (S) correspond to the complex-valued scattering parameters that are returned by the ceviche-challenges model instance. This means that |S|^2 are the power scattering parameters, i.e. a value of 1.0 corresponding to full transmission and 0.5 corresponding to half power transmission (-3.0 in dB / log scale). We used the linear scale power quantities in the loss function for the optimizations in the paper, not dB / log scale quantities.

jan-david-fischbach commented 1 year ago

👍🏻 Perfect. Then It should be exactly as implemented. We only needed the dB conversion to convert the target values to linear scale.

lucasgrjn commented 1 year ago

Thanks @ianwilliamson ! Now, all make sense.

jan-david-fischbach commented 1 year ago

But that means that this still holds, right?

I tried the following: Running the optimization loop including the loss from above for the modeconverter, but ignoring the generator. When comparing that to the paper: I have a hard time believing that the loss is reduced more strongly including the fabrication constraints than without. What do you think @flaport, @Dj1312?

lucasgrjn commented 1 year ago

But that means that this still holds, right?

I have a hard time believing that the loss is reduced more strongly including the fabrication constraints than without.

Yep.. This issue remains unresolved

lucasgrjn commented 1 year ago

@Jan-David-Black some thoughts on the loss issue. I use the Fig.5 of the paper to extract the following values:

at step 1, $|S{11}|^2 \approx -18dB$ and $|S{21}|^2 \approx -27dB$ which leads to $L \approx 250$
at step 122, $|S{11}|^2 \approx -50dB$ and $|S{21}|^2 \approx -0.1dB$ which leads to $L \approx 0.4$
If we normalize the two values, we obtained a min value of $L \approx 1.6e^{-3}$

The obtained results is above the red curve. So it seems, there is a problem around the loss function.

If I use the results I obtain for a simple optimization (without the generator):

at step 1, I got $L \approx 254.7$
at step 150, I got $L \approx 0.39$
If we normalize the two values, we obtained a min value of $L \approx 1.53e^{-3}$

As you pointed, the loss value of the binarized design is so close of the non binarized one..

jan-david-fischbach commented 1 year ago

Could it be something with the softplus? I just used the one available in JAX. The one in Pytorch seems to have an additional $\beta$ parameter: https://pytorch.org/docs/stable/generated/torch.nn.Softplus.html Other than that I see little room for error in the equation...

lucasgrjn commented 1 year ago

It may be the softplus as you pointed but I think the paper implemented the original. Moreover, they would have mentioned it on the description. (Maybe the step 0 is one with a random binarized design and step 1 is one with a full solid design ? Despite the fact I dont give a lot of value to this possibility...)

If the binarized design loss follows the trend of the non-generated one, at least, I think we can assume we are safe.

ianwilliamson commented 1 year ago

If it helps, the designs for the mode converter problem (as CSV files) are available here, under the designs/ folder. A design from step 134 and 159 of the optimizations in the original paper are available.

lucasgrjn commented 1 year ago

Thanks for the tip. I am going to take a look at it!

jan-david-fischbach commented 1 year ago

I think we can close this one as the loss function seems to be on track

flaport / inverse_design

Loss function #18