Incorrect gradients when relative Gaussian pulse width is not 0.1

ianwilliamson commented 3 years ago

Previously, the first time domain source used in the forward simulation was pulled to construct the source for the adjoint simulation (calculating a scaling factor). With the change in #1515:

https://github.com/NanoComp/meep/blob/0b460cdd24f28b5c9576e9fc2d3854056c4545c5/python/adjoint/objective.py#L96

it seems that there is an error in the gradients when the width of the Gaussian for the forward simulation is different than the 0.1 width that is hardcoded into the function signature here:

https://github.com/NanoComp/meep/blob/0b460cdd24f28b5c9576e9fc2d3854056c4545c5/python/adjoint/objective.py#L239

For some reason this error isn't caught by the Meep unit test which checks the projection of the gradient in a random direction, but it seems that the error appears when individual points in the gradient are validated against finite differences. Have you performed a check of the gradient at individual points within the design against finite differences?

smartalecH commented 3 years ago

An easy way to check is to simply run tutorial 1. It randomly samples several FD gradients and compares to the corresponding adjoint gradient.

If this is true, I wouldn't be surprised, as this weird phase factor has been a killer...

Linking #1535.

smartalecH commented 3 years ago

Ah after looking through it again, it seems like you manually changed the hardcoded badnwidth of 10% for the single frequency case. In theory, this should be totally fine, but there is probably another phase factor hardcoded in that relates to this 10%.

In other words, we just need to refactor a bit to allow the user to specify an arbitrary bandwidth if they want (although for a single frequency case, 10% should be just fine...).

At least that's my initial thought... it will require some time to manually go through it and validate it with the math (again...)

ianwilliamson commented 3 years ago

I believe the gradients will be incorrect in both the single-frequency and broadband cases whenever a Gaussian source of width != 0.1 is used. I think the width that's hardcoded here:

https://github.com/NanoComp/meep/blob/0b460cdd24f28b5c9576e9fc2d3854056c4545c5/python/adjoint/objective.py#L239

probably needs to be removed and the width should be determined based on whatever source(s) were used for the forward simulation.

smartalecH commented 3 years ago

I believe the gradients will be incorrect in both the single-frequency and broadband case

Note that the broadband case uses a custom time profile (that is not a Gaussian) so this fwidth_frac parameter should be completely ignored. That being said, are you still seeing an inconsistency with the broadband case? That would indicate a good place to start tracking down the (potential) bug.

probably needs to be removed and the width should be determined based on whatever source(s) were used for the forward simulation.

For the single frequency case, the only thing that really matters is the center frequency of the gaussian (since that's all you really need to calculate the gradient). Of course, you want a reasonable bandwidth so that the simulation doesn't take forever to run out (or conversely, blow up).

In the past, like you said, we used to simply copy over the forward time profile and just used that. But there were some hidden bugs that were painfully hard to find, so we opted for this approach. There was another reason relating to random susceptibilities, but I'll defer that discussion to @stevengj (since I don't ever use those).

ianwilliamson commented 3 years ago

The scale parameter defined on this line:

https://github.com/NanoComp/meep/blob/0b460cdd24f28b5c9576e9fc2d3854056c4545c5/python/adjoint/objective.py#L69

seems to be used in both branches of the if / else immediately afterwards, which correspond to the single-frequency and broadband cases, respectively.

Additionally, the adj_src_scale() function references the time_src on this line:

https://github.com/NanoComp/meep/blob/0b460cdd24f28b5c9576e9fc2d3854056c4545c5/python/adjoint/objective.py#L271

and uses this to scale both the single-frequency and broadband cases further down in that function (via the DTFT). The time_src attribute gets created by the function mentioned in the comment above on this line:

https://github.com/NanoComp/meep/blob/0b460cdd24f28b5c9576e9fc2d3854056c4545c5/python/adjoint/objective.py#L96

So, yeah. It's difficult to follow because there are multiple pathways across multiple functions, but indeed this does seem to affect both cases.

smartalecH commented 3 years ago

seems to be used in both branches of the if / else immediately afterwards, which correspond to the single-frequency and broadband cases, respectively.

Yes that's expected -- there are constant phase factors that are needed in both cases, which is why the function also has an if-else.

and uses this to scale both the single-frequency and broadband cases further down in that function (via the DTFT)

The broadband implementation only uses adj_src_phase: https://github.com/NanoComp/meep/blob/0b460cdd24f28b5c9576e9fc2d3854056c4545c5/python/adjoint/objective.py#L286

Which is only defined at the center frequency. This is that weird phase factor discussed in #1535. If you can find a way around that factor, that would be great! I just haven't had time to debug it myself (and I haven't seen any issues with my optimization runs).

Have you seen a discrepancy in the broadband gradients?

smartalecH commented 2 years ago

(cc @oskooi )

This still seems to be an issue.

NanoComp / meep

Incorrect gradients when relative Gaussian pulse width is not 0.1 #1541