Open oskooi opened 1 year ago
Here is an improved test which compares the adjoint gradients from the multi- vs. single-frequency simulations for five equally spaced frequencies spanning the full range of the source bandwidth. The gradient comparison uses the assertClose
method which is based on the $L_\infty$ norm.
In order for this test to pass (using Meep compiled with double-precision floating point and a grid resolution of 50), the relative tolerance parameter of assertClose
(tol_grad
in the script) must be at least 1.1. A relative tolerance greater than one is an indication of unacceptably large errors in the broadband adjoint gradients.
def test_broadband_gradient(self):
print("*** TESTING BROADBAND GRADIENT ***")
nfrq = 5
multifreq = np.linspace(
self.fcen - 0.5*self.df,
self.fcen + 0.5*self.df,
nfrq,
)
multifreq_val, multifreq_grad = self.adjoint_solver_two_objfunc(
self.p,
multifreq,
)
tol_objfunc = 0.01 # relative tolerance for objective function comparison
tol_grad = 1.1 # relative tolerance for adjoint gradient comparison
for n in range(nfrq):
frq = multifreq[n]
singlefreq_val, singlefreq_grad = self.adjoint_solver_two_objfunc(
self.p,
[frq],
)
for m in [0, 1]:
s = n if m == 0 else n+nfrq
self.assertClose(singlefreq_val[m], multifreq_val[s], tol_objfunc)
self.assertClose(singlefreq_grad[:, m], multifreq_grad[:, s], tol_grad)
print(f"PASSED: frequency={frq}, m={m}.")
Importantly, if the frequency range is reduced to 40% of the source bandwidth, the test passes with a smaller relative tolerance of tol_grad = 0.01
. This is an indication that adjoint gradients computed at frequencies near the edge of the source bandwidth are the least accurate.
multifreq = np.linspace(
self.fcen - 0.2*self.df,
self.fcen + 0.2*self.df,
nfrq,
)
To summarize, the main finding here is that in the current setup of the adjoint solver, accurate adjoint gradients over a broad bandwidth can only be obtained if the monitor bandwidth is significantly smaller than the source bandwidth.
Here is a test for the bandwidth of the eigenmode source for a single-frequency calculation of the EigenmodeCoefficient
objective function. The test verifies that the adjoint gradient computed at the center frequency of a pulsed source is the same regardless of the source bandwidth. Unfortunately, the adjoint gradient does seem to depend on the source bandwidth because a non-negligible relative tolerance (tol_grad = 0.05
) is necessary to ensure the assertClose
comparison passes for all the different source bandwidths tested.
These results indicate that the accuracy of the adjoint gradient depends on the source bandwidth.
def test_mode_source_bandwidth(self):
print("*** TESTING MODE SOURCE BANDWIDTH ***")
# reference values for objective function and adjoint gradient
# computed using the default source bandwidth self.df
ref_objf, ref_grad = self.adjoint_solver_two_objfunc(
self.p,
[self.fcen],
)
nfw = 7
fw = np.linspace(
0.05,
0.65,
nfw,
)
tol_objfunc = 1e-5 # relative tolerance for objective function comparison
tol_grad = 0.05 # relative tolerance for adjoint gradient comparison
for n in range(nfw):
fwidth = fw[n] * self.fcen
if fwidth != self.df:
self.mode_source = [
mp.EigenModeSource(
src=mp.GaussianSource(self.fcen, fwidth=fwidth),
center=mp.Vector3(-0.5 * self.sxy + self.dpml, 0),
size=mp.Vector3(0, self.sxy - 2 * self.dpml),
eig_parity=self.eig_parity,
)
]
objf, grad = self.adjoint_solver_two_objfunc(
self.p,
[self.fcen],
)
for m in [0, 1]:
self.assertClose(ref_objf[m], objf[m], tol_objfunc)
self.assertClose(ref_grad[:, m], grad[:, m], tol_grad)
print(f"PASSED: fwidth={fwidth:.5f}, m={m}.")
It seems that the adjoint gradient computed at the center frequency of a broadband source becomes less accurate as the source bandwidth is increased.
This is demonstrated using a test which computes the error in the directional derivative from the adjoint gradient and the brute-force finite-difference approximation. The trend in the relative error vs. source bandwidth plot is similar for both reflectance and transmittance monitors and for different resolutions.
This finding could mean that the choice of the tolerance parameter used in the assertClose
statements of the existing unit tests in test_adjoint_solver.py
for comparing the adjoint gradient with the finite-difference results have been arbitrary because the relative error depends on the source bandwidth which itself was chosen arbitrarily:
On line 64, simply changing cls.df = 0.2 * cls.fcen
to e.g. cls.df = 0.4 * cls.fcen
causes some of the tests to fail.
def test_mode_source_bandwidth(self):
print("*** TESTING MODE SOURCE BANDWIDTH ***")
# compute objective value for unperturbed design
unperturbed_objf = self.adjoint_solver_two_objfunc(
self.p,
[self.fcen],
need_gradient=False,
)
# compute objective value for perturbed design
perturbed_objf = self.adjoint_solver_two_objfunc(
self.p + self.dp,
[self.fcen],
need_gradient=False,
)
fnd_dd = perturbed_objf - unperturbed_objf
print(f"fnd_dd:, {fnd_dd}")
nfw = 11
fw = np.linspace(
0.05,
0.95,
nfw,
)
for n in range(nfw):
fwidth = fw[n] * self.fcen
self.mode_source = [
mp.EigenModeSource(
src=mp.GaussianSource(self.fcen, fwidth=fwidth),
center=mp.Vector3(-0.5 * self.sxy + self.dpml, 0),
size=mp.Vector3(0, self.sxy - 2 * self.dpml),
eig_parity=self.eig_parity,
)
]
objf, grad = self.adjoint_solver_two_objfunc(
self.p,
[self.fcen],
)
for m in [0, 1]:
adj_dd = (self.dp[None, :] @ grad[:, m]).flatten()
rel_err = abs((fnd_dd[m] - adj_dd[0]) / fnd_dd[m])
print(f"grad:, {fwidth:.5f}, {m}, {adj_dd[0]}, {rel_err}")
Double-check that the run time is the same for all the bandwidths?
I would do what @stevengj suggested first... but in addition we should carefully go over the "fudge factor" we currently have implemented in the codebase. In particular, my thesis derives the proper fudge factor (chapter 3.2.2):
Currently, we apply our fudge factor on the adjoint sources (which should be fine, thanks to linearity). However, it would be cleaner to refactor this and apply this on the recombination step of the adjoint solver.
that's exactly the problem #(https://github.com/NanoComp/meep/issues/2106) I have bumped with several time ago, I think the problem comes from the turn-on effect from Gaussian function where you put abs()> cutoff even in the beginning, which explains the initial phase uncertainty while changing bandwidth.
file: https://github.com/NanoComp/meep/blob/master/src/sources.cpp
"""
complex
Solution: just remove fabs function, so that only cutoff happens in tail of gaussian src profile.
Double-check that the run time is the same for all the bandwidths?
Following @stevengj's suggestion and using a fixed runtime of 120 for the forward and adjoint simulations produces results which more clearly demonstrate the loss of accuracy in the adjoint gradient with increasing source bandwidth for a single-frequency calculation.
For these results, the accuracy of the smallest source bandwidth Δf = 0.05fcen is reduced by nearly one to two orders of magnitude for Δf > 0.2fcen. The most noticeable change of the fixed runtime is for the relative error to remain unchanged for Δf > 0.2fcen.
Based on these results, it would seem that until we have a fix, we should be using a source bandwidth of Δfsrc < ~0.05fcen and a monitor bandwidth of Δfmon < ~0.2Δfsrc to ensure the highest possible accuracy for the adjoint gradients.
Based on these results, it would seem that until we have a fix, we should be using a source bandwidth of Δfsrc < ~0.05fcen and a monitor bandwidth of Δfmon < ~0.2Δfsrc to ensure the highest possible accuracy for the adjoint gradients.
Unfortunately this heuristic doesn't generalize to other problems... especially if the dominate source of error is due to the lack of accounting for modal dispersion.
I think the problem comes from the turn-on effect from Gaussian function where you put abs()> cutoff even in the beginning, which explains the initial phase uncertainty while changing bandwidth.
Hmm possibly, but I don't think so. For one, the "clipping" occurs in the tails of the Gaussian. In other words, it just convolves the frequency response with a narrow sinc, which will have a minimal phase perturbation. Even if you remove that element of the code, you'll still have the effect (albeit with an even narrower sinc) as you can't run your simulation forever (as a true Gaussian would require)...
More importantly, we explicitly account for this by performing a manual dtft on the time source to properly normalize it out in the adjoint run. In other words, you should be able to drive your forward run with any arbitrary profile, and the adjoint solver should "just work" thanks to this brute force normalization.
Following @stevengj's suggestion and using a fixed runtime of 120
I know we're in the weeds here, and the issue is most likely either due to the fact we aren't calculating the modes at all frequencies, or that we have a bug in the fudge factor, but are you sure you ran it long enough?
Okay one way to rule out the fudge factor issue: just swap out the mode monitors for dft monitors. If you see the same gradient discrepancy across a broad bandwidth, then you know that there is something else wrong with the fudge factor (or normalization).
If the gradients in this case are consistent, then the dominant source of error is most likely the modal dispersion (for which we have an "easy" solution).
but are you sure you ran it long enough?
It turns out that a fixed runtime of 120 was indeed too short for the smallest source bandwidths Δf = 0.05fcen shown previously. The results were not yet converged. Nevertheless, increasing the fixed runtime from 120 to 500 (for which the results are converged) for the single-frequency adjoint gradient calculations does not affect the variation of the accuracy of the adjoint gradient with the source bandwidth.
As an additional demonstration, here are the results for the waveguide mode converter test problem using a decay_by=1e-13
in the OptimizationProblem
class constructor (the default is 1e-11). In this example, the source bandwidth Δfsrc is restricted to the range of [0.05, 0.40] fcen which is smaller than the range [0.05, 0.95] fcen of the results shown previously.
What is interesting is that the variation of the error only becomes apparent at the largest resolutions (60 and 120). At the smallest resolution of 30 (which is what we are using in test_adjoint_solver.py), the error seems to be independent of the source bandwidth. This could be why we have not noticed this effect until now.
Two main items to note in these results:
edit (11/28): more details on test setup and additional results.
Okay one way to rule out the fudge factor issue: just swap out the mode monitors for dft monitors. If you see the same gradient discrepancy across a broad bandwidth, then you know that there is something else wrong with the fudge factor (or normalization).
Good suggestion, @smartalecH. I tried this out and it seems that the relative error in the broadband adjoint gradient of a DFT fields monitor objective function does increase with distance from the center frequency of the pulsed source. In the results shown in the figure, the center wavelength of the pulsed source is 1.55 μm for which the error is a minimum. As a check, the results are consistent at three different resolutions.
We already know that there is an error in the broadband adjoint gradients due to the eigenmode source (#2324). Based on these results, it seems there is an additional error from the fudge factor.
def test_multifreq_monitor(self):
"""Verifies that the individual adjoint gradients from a multifrequency
DFT fields monitor are equivalent to a single-frequency monitor."""
print("*** TESTING MULTIFREQUENCY MONITOR ***")
nfrq = 11
frqs = np.linspace(
self.fcen - 0.2 * self.df,
self.fcen + 0.2 * self.df,
nfrq,
)
multifrq_val, multifrq_grad = self.adjoint_solver(
self.p,
MonitorObject.DFT,
frqs,
)
for n in range(nfrq):
frq = frqs[n]
self.fcen = frq
self.df = 0.05*self.fcen
self.mode_source = [
mp.EigenModeSource(
src=mp.GaussianSource(self.fcen, fwidth=self.df),
center=mp.Vector3(-0.5 * self.sxy + self.dpml, 0),
size=mp.Vector3(0, self.sxy - 2 * self.dpml),
eig_parity=self.eig_parity,
)
]
singlefrq_val, singlefrq_grad = self.adjoint_solver(
self.p,
MonitorObject.DFT,
[frq],
)
rel_err = np.abs(singlefrq_val[0] - multifrq_val[n]) / singlefrq_val[0]
print(f"objfunc:, {singlefrq_val[0]:.8f}, {multifrq_val[n]:.8f}, {rel_err:.6f}")
rel_err = np.linalg.norm(singlefrq_grad - multifrq_grad[:, n])
print(f"grad:, {frq:.5f}, {rel_err:.10f}")
For comparison, here are results for the same test but using an eigenmode monitor objective function. The same trend of increasing error from the center frequency of the source is apparent.
def test_multifreq_monitor(self):
"""Verifies that the individual adjoint gradients from a multifrequency
eigenmode monitor are equivalent to a single-frequency monitor."""
print("*** TESTING MULTIFREQUENCY MONITOR ***")
nfrq = 11
frqs = np.linspace(
self.fcen - 0.2 * self.df,
self.fcen + 0.2 * self.df,
nfrq,
)
multifrq_val, multifrq_grad = self.adjoint_solver(
self.p,
MonitorObject.EIGENMODE,
frqs,
)
for n in range(nfrq):
frq = frqs[n]
self.fcen = frq
self.df = 0.05*self.fcen
self.mode_source = [
mp.EigenModeSource(
src=mp.GaussianSource(self.fcen, fwidth=self.df),
center=mp.Vector3(-0.5 * self.sxy + self.dpml, 0),
size=mp.Vector3(0, self.sxy - 2 * self.dpml),
eig_parity=self.eig_parity,
)
]
singlefrq_val, singlefrq_grad = self.adjoint_solver(
self.p,
MonitorObject.EIGENMODE,
[frq],
)
rel_err = np.abs(singlefrq_val[0] - multifrq_val[n]) / singlefrq_val[0]
print(f"objfunc:, {singlefrq_val[0]:.8f}, {multifrq_val[n]:.8f}, {rel_err:.6f}")
rel_err = np.linalg.norm(singlefrq_grad - multifrq_grad[:, n])
print(f"grad:, {frq:.5f}, {rel_err:.10f}")
The function adjoint_solver
has been modified such that the objective function is a dimensionless quantity involving normalization of the DFT fields or mode coefficient by the input flux from the eigenmode source:
# Compute the incident fields of the mode source
# in the straight waveguide for use as normalization
# of the reflectance (S11) measurement.
ref_sim = mp.Simulation(
resolution=self.resolution,
cell_size=self.cell_size,
boundary_layers=self.pml_xy,
sources=self.mode_source,
geometry=self.waveguide_geometry,
)
dft_mon = ref_sim.add_mode_monitor(
frequencies,
mp.ModeRegion(
center=mp.Vector3(0.5 * self.sxy - self.dpml),
size=mp.Vector3(0, self.sxy - 2 * self.dpml, 0),
),
yee_grid=True,
)
ref_sim.run(until_after_sources=20)
input_flux = np.array(mp.get_fluxes(dft_mon))
if mon_type.name == "EIGENMODE":
obj_list = [
mpa.EigenmodeCoefficient(
sim,
mp.Volume(
center=mp.Vector3(0.5 * self.sxy - self.dpml),
size=mp.Vector3(0, self.sxy - 2 * self.dpml, 0),
),
1,
eig_parity=self.eig_parity,
)
]
def J(tran_mon):
return npa.power(npa.abs(tran_mon), 2) / input_flux
elif mon_type.name == "DFT":
obj_list = [
mpa.FourierFields(
sim,
mp.Volume(center=mp.Vector3(1.25), size=mp.Vector3(0.25, 1, 0)),
self.src_cmpt,
)
]
def J(mode_mon):
return npa.power(npa.abs(mode_mon[:, 4, 10]), 2) / input_flux
I think looking as a function of resolution may be a red herring here — the narrow-band (single-frequency) and broadband gradient calculations should match even at a low resolution (i.e. regardless of whether we are modeling Maxwell's equations or a "discrete wave" system), since it is a consequence of linearity alone. (The only things that should spoil it are truncation due to the finite runtime and floating-point roundoff, so the match should improve with runtime until it hits the floating-point limit.)
Be sure to compute the relative error in the gradients, by the way, e.g. ‖narrow - broad‖₂ / ‖narrow‖₂
.
This is an attempt to investigate the errors in the adjoint gradient of a broadband simulation. The test involves comparing the adjoint gradients using two different configurations:
The adjoint gradient at wavelength $\lambda_m$ in (1) should match the adjoint gradient $\lambda_m$ in (2) for each $m=1,2,3$. However, we should expect some discrepancy in the results for $\lambda_1$ and $\lambda_3$ (the non-center wavelengths of the pulsed source) because the
EigenmodeSource
uses the mode profile for the center frequency only. While we have already investigated the errors introduced by the broadband mode source at off-center frequencies in Tutorial/Eigenmode Source, we do not seem to have investigated the effect of these errors on the adjoint gradient.The test involves two objective functions: reflectance and transmittance of a waveguide mode converter. A schematic of the simulation setup is shown in the first figure below. The design region is random. The results (second figure) demonstrate that the $L_2$ norm of the difference in the adjoint gradient for (1) and (2) is nearly one order of magnitude larger for the non-center wavelengths relative to the center wavelength. The relative error is larger for the reflectance objective function (blue data) than the transmittance objective function (red data). When the absolute magnitude of the gradients in (1) and (2) are compared side by side, they seem to be similar (third and fourth figures).
It would seem that a one order of magnitude reduction in the relative accuracy of the adjoint gradient is large enough that it could degrade the convergence of a topology optimization run.
1. Adjoint gradient of reflectance objective function at $\lambda$ = 1.55 μm (center wavelength)
2. Adjoint gradient of transmittance objective function at $\lambda$ = 1.55 μm (center wavelength)
The script used to generate these results will eventually be added as a separate unit test to
test_adjoint_solver.py
.