NanoComp / meep

free finite-difference time-domain (FDTD) software for electromagnetic simulations
GNU General Public License v2.0
1.21k stars 613 forks source link

investigating the error in broadband adjoint gradients #2307

Open oskooi opened 1 year ago

oskooi commented 1 year ago

This is an attempt to investigate the errors in the adjoint gradient of a broadband simulation. The test involves comparing the adjoint gradients using two different configurations:

  1. a multi-frequency adjoint simulation at wavelengths [ $\lambda_1$, $\lambda_2$, $\lambda_3$ ]
  2. a single-frequency adjoint simulation at a wavelength of $\lambda_1$, $\lambda_2$, or $\lambda_3$

The adjoint gradient at wavelength $\lambda_m$ in (1) should match the adjoint gradient $\lambda_m$ in (2) for each $m=1,2,3$. However, we should expect some discrepancy in the results for $\lambda_1$ and $\lambda_3$ (the non-center wavelengths of the pulsed source) because the EigenmodeSource uses the mode profile for the center frequency only. While we have already investigated the errors introduced by the broadband mode source at off-center frequencies in Tutorial/Eigenmode Source, we do not seem to have investigated the effect of these errors on the adjoint gradient.

The test involves two objective functions: reflectance and transmittance of a waveguide mode converter. A schematic of the simulation setup is shown in the first figure below. The design region is random. The results (second figure) demonstrate that the $L_2$ norm of the difference in the adjoint gradient for (1) and (2) is nearly one order of magnitude larger for the non-center wavelengths relative to the center wavelength. The relative error is larger for the reflectance objective function (blue data) than the transmittance objective function (red data). When the absolute magnitude of the gradients in (1) and (2) are compared side by side, they seem to be similar (third and fourth figures).

It would seem that a one order of magnitude reduction in the relative accuracy of the adjoint gradient is large enough that it could degrade the convergence of a topology optimization run.

waveguide_mode_converter_opt

norm_adjoint_gradient

1. Adjoint gradient of reflectance objective function at $\lambda$ = 1.55 μm (center wavelength)

grad_single_vs_multi_frequency_refl_1 55_fcen

2. Adjoint gradient of transmittance objective function at $\lambda$ = 1.55 μm (center wavelength)

grad_single_vs_multi_frequency_tran_1 55_fcen

The script used to generate these results will eventually be added as a separate unit test to test_adjoint_solver.py.

    def test_broadband_gradient(self):
        print("*** TESTING BROADBAND GRADIENT ***")

        multifreq = [1 / 1.58, self.fcen, 1 / 1.53]
        multifreq_val, multifreq_grad = self.adjoint_solver_two_objfunc(
            self.p,
            multifreq,
        )

        mf = 0
        singlefreq = [multifreq[mf]]
        singlefreq_val, singlefreq_grad = self.adjoint_solver_two_objfunc(
            self.p,
            singlefreq,
        )

        import matplotlib
        matplotlib.use('agg')
        import matplotlib.pyplot as plt
        from mpl_toolkits.axes_grid1 import make_axes_locatable

        def place_colorbar():
            divider = make_axes_locatable(plt.gca())
            cax = divider.append_axes("right", size="5%", pad=0.05)
            plt.colorbar(cax=cax)

        # visualize the gradient                                                                                  
        for m in [0, 1]:
            multifreq_slice = mf if m == 0 else mf+3

            fig = plt.figure()

            plt.subplot(1,2,1)
            plt.imshow(
                np.abs(multifreq_grad[:, multifreq_slice].reshape(self.Nx, self.Ny)),
                interpolation='none',
                cmap='RdBu'
            )
            plt.title("multi frequency")
            place_colorbar()

            plt.subplot(1,2,2)
            plt.imshow(
                np.abs(singlefreq_grad[:, m].reshape(self.Nx, self.Ny)),
                interpolation='none',
                cmap='RdBu'
            )
            plt.title("single frequency")
            place_colorbar()

            if m == 0:
                t = "refl"
            else:
                t = "tran"

            plt.subplots_adjust(wspace=0.4)
            plt.savefig(
                f"grad_single_vs_multi_frequency_{t}.png",
                dpi=150,
                bbox_inches="tight",
            )

            norm = np.linalg.norm(multifreq_grad[:, multifreq_slice] -
                                  singlefreq_grad[:, m])
            print(f"norm:, {t}, {norm:.6f}")
oskooi commented 1 year ago

Here is an improved test which compares the adjoint gradients from the multi- vs. single-frequency simulations for five equally spaced frequencies spanning the full range of the source bandwidth. The gradient comparison uses the assertClose method which is based on the $L_\infty$ norm.

In order for this test to pass (using Meep compiled with double-precision floating point and a grid resolution of 50), the relative tolerance parameter of assertClose (tol_grad in the script) must be at least 1.1. A relative tolerance greater than one is an indication of unacceptably large errors in the broadband adjoint gradients.

    def test_broadband_gradient(self):
        print("*** TESTING BROADBAND GRADIENT ***")

        nfrq = 5
        multifreq = np.linspace(
            self.fcen - 0.5*self.df,
            self.fcen + 0.5*self.df,
            nfrq,
        )
        multifreq_val, multifreq_grad = self.adjoint_solver_two_objfunc(
            self.p,
            multifreq,
        )

        tol_objfunc = 0.01 # relative tolerance for objective function comparison
        tol_grad = 1.1 # relative tolerance for adjoint gradient comparison
        for n in range(nfrq):
            frq = multifreq[n]
            singlefreq_val, singlefreq_grad = self.adjoint_solver_two_objfunc(
                self.p,
                [frq],
            )
            for m in [0, 1]:
                s = n if m == 0 else n+nfrq
                self.assertClose(singlefreq_val[m], multifreq_val[s], tol_objfunc)
                self.assertClose(singlefreq_grad[:, m], multifreq_grad[:, s], tol_grad)
                print(f"PASSED: frequency={frq}, m={m}.")

Importantly, if the frequency range is reduced to 40% of the source bandwidth, the test passes with a smaller relative tolerance of tol_grad = 0.01. This is an indication that adjoint gradients computed at frequencies near the edge of the source bandwidth are the least accurate.

        multifreq = np.linspace(
            self.fcen - 0.2*self.df,
            self.fcen + 0.2*self.df,
            nfrq,
        )

To summarize, the main finding here is that in the current setup of the adjoint solver, accurate adjoint gradients over a broad bandwidth can only be obtained if the monitor bandwidth is significantly smaller than the source bandwidth.

oskooi commented 1 year ago

Here is a test for the bandwidth of the eigenmode source for a single-frequency calculation of the EigenmodeCoefficient objective function. The test verifies that the adjoint gradient computed at the center frequency of a pulsed source is the same regardless of the source bandwidth. Unfortunately, the adjoint gradient does seem to depend on the source bandwidth because a non-negligible relative tolerance (tol_grad = 0.05) is necessary to ensure the assertClose comparison passes for all the different source bandwidths tested.

These results indicate that the accuracy of the adjoint gradient depends on the source bandwidth.

    def test_mode_source_bandwidth(self):
    print("*** TESTING MODE SOURCE BANDWIDTH ***")

        # reference values for objective function and adjoint gradient
        # computed using the default source bandwidth self.df  
        ref_objf, ref_grad = self.adjoint_solver_two_objfunc(
            self.p,
            [self.fcen],
        )

    nfw = 7
    fw = np.linspace(
            0.05,
            0.65,
            nfw,
        )

        tol_objfunc = 1e-5 # relative tolerance for objective function comparison
        tol_grad = 0.05    # relative tolerance for adjoint gradient comparison
        for n in range(nfw):
            fwidth = fw[n] * self.fcen
            if fwidth != self.df:
                self.mode_source = [
                    mp.EigenModeSource(
                        src=mp.GaussianSource(self.fcen, fwidth=fwidth),
                        center=mp.Vector3(-0.5 * self.sxy + self.dpml, 0),
                        size=mp.Vector3(0, self.sxy - 2 * self.dpml),
                        eig_parity=self.eig_parity,
                    )
                ]

                objf, grad = self.adjoint_solver_two_objfunc(
            self.p,
                    [self.fcen],
                )

                for m in [0, 1]:
                    self.assertClose(ref_objf[m], objf[m], tol_objfunc)
                    self.assertClose(ref_grad[:, m], grad[:, m], tol_grad)
                    print(f"PASSED: fwidth={fwidth:.5f}, m={m}.")
oskooi commented 1 year ago

It seems that the adjoint gradient computed at the center frequency of a broadband source becomes less accurate as the source bandwidth is increased.

This is demonstrated using a test which computes the error in the directional derivative from the adjoint gradient and the brute-force finite-difference approximation. The trend in the relative error vs. source bandwidth plot is similar for both reflectance and transmittance monitors and for different resolutions.

This finding could mean that the choice of the tolerance parameter used in the assertClose statements of the existing unit tests in test_adjoint_solver.py for comparing the adjoint gradient with the finite-difference results have been arbitrary because the relative error depends on the source bandwidth which itself was chosen arbitrarily:

https://github.com/NanoComp/meep/blob/d100be405292c1a3f33b05642a0333ac7c50db87/python/tests/test_adjoint_solver.py#L63-L72

On line 64, simply changing cls.df = 0.2 * cls.fcen to e.g. cls.df = 0.4 * cls.fcen causes some of the tests to fail.

adjgrad_err_vs_source_bandwidth_refl

adjgrad_err_vs_source_bandwidth_tran

    def test_mode_source_bandwidth(self):
        print("*** TESTING MODE SOURCE BANDWIDTH ***")

    # compute objective value for unperturbed design                                                          
    unperturbed_objf = self.adjoint_solver_two_objfunc(
        self.p,
        [self.fcen],
            need_gradient=False,
        )

    # compute objective value for perturbed design                                                            
        perturbed_objf = self.adjoint_solver_two_objfunc(
        self.p + self.dp,
            [self.fcen],
        need_gradient=False,
    )

    fnd_dd = perturbed_objf - unperturbed_objf
    print(f"fnd_dd:, {fnd_dd}")

    nfw = 11
    fw = np.linspace(
        0.05,
        0.95,
            nfw,
    )

    for n in range(nfw):
            fwidth = fw[n] * self.fcen
        self.mode_source = [
            mp.EigenModeSource(
                src=mp.GaussianSource(self.fcen, fwidth=fwidth),
                    center=mp.Vector3(-0.5 * self.sxy + self.dpml, 0),
                size=mp.Vector3(0, self.sxy - 2 * self.dpml),
                eig_parity=self.eig_parity,
                )
        ]

        objf, grad = self.adjoint_solver_two_objfunc(
                self.p,
            [self.fcen],
            )

        for m in [0, 1]:
            adj_dd = (self.dp[None, :] @ grad[:, m]).flatten()
                rel_err = abs((fnd_dd[m] - adj_dd[0]) / fnd_dd[m])                                        
        print(f"grad:, {fwidth:.5f}, {m}, {adj_dd[0]}, {rel_err}")
stevengj commented 1 year ago

Double-check that the run time is the same for all the bandwidths?

smartalecH commented 1 year ago

I would do what @stevengj suggested first... but in addition we should carefully go over the "fudge factor" we currently have implemented in the codebase. In particular, my thesis derives the proper fudge factor (chapter 3.2.2):

image

Currently, we apply our fudge factor on the adjoint sources (which should be fine, thanks to linearity). However, it would be cleaner to refactor this and apply this on the recombination step of the adjoint solver.

MarkMa1990 commented 1 year ago

that's exactly the problem #(https://github.com/NanoComp/meep/issues/2106) I have bumped with several time ago, I think the problem comes from the turn-on effect from Gaussian function where you put abs()> cutoff even in the beginning, which explains the initial phase uncertainty while changing bandwidth.

file: https://github.com/NanoComp/meep/blob/master/src/sources.cpp

""" complex gaussian_src_time::dipole(double time) const { double tt = time - peak_time; if (float(fabs(tt)) > cutoff) return 0.0; """

Solution: just remove fabs function, so that only cutoff happens in tail of gaussian src profile.

oskooi commented 1 year ago

Double-check that the run time is the same for all the bandwidths?

Following @stevengj's suggestion and using a fixed runtime of 120 for the forward and adjoint simulations produces results which more clearly demonstrate the loss of accuracy in the adjoint gradient with increasing source bandwidth for a single-frequency calculation.

For these results, the accuracy of the smallest source bandwidth Δf = 0.05fcen is reduced by nearly one to two orders of magnitude for Δf > 0.2fcen. The most noticeable change of the fixed runtime is for the relative error to remain unchanged for Δf > 0.2fcen.

Based on these results, it would seem that until we have a fix, we should be using a source bandwidth of Δfsrc < ~0.05fcen and a monitor bandwidth of Δfmon < ~0.2Δfsrc to ensure the highest possible accuracy for the adjoint gradients.

adjgrad_err_vs_source_bandwidth_refl_rt120

adjgrad_err_vs_source_bandwidth_tran_rt120

smartalecH commented 1 year ago

Based on these results, it would seem that until we have a fix, we should be using a source bandwidth of Δfsrc < ~0.05fcen and a monitor bandwidth of Δfmon < ~0.2Δfsrc to ensure the highest possible accuracy for the adjoint gradients.

Unfortunately this heuristic doesn't generalize to other problems... especially if the dominate source of error is due to the lack of accounting for modal dispersion.

smartalecH commented 1 year ago

I think the problem comes from the turn-on effect from Gaussian function where you put abs()> cutoff even in the beginning, which explains the initial phase uncertainty while changing bandwidth.

Hmm possibly, but I don't think so. For one, the "clipping" occurs in the tails of the Gaussian. In other words, it just convolves the frequency response with a narrow sinc, which will have a minimal phase perturbation. Even if you remove that element of the code, you'll still have the effect (albeit with an even narrower sinc) as you can't run your simulation forever (as a true Gaussian would require)...

More importantly, we explicitly account for this by performing a manual dtft on the time source to properly normalize it out in the adjoint run. In other words, you should be able to drive your forward run with any arbitrary profile, and the adjoint solver should "just work" thanks to this brute force normalization.

smartalecH commented 1 year ago

Following @stevengj's suggestion and using a fixed runtime of 120

I know we're in the weeds here, and the issue is most likely either due to the fact we aren't calculating the modes at all frequencies, or that we have a bug in the fudge factor, but are you sure you ran it long enough?

smartalecH commented 1 year ago

Okay one way to rule out the fudge factor issue: just swap out the mode monitors for dft monitors. If you see the same gradient discrepancy across a broad bandwidth, then you know that there is something else wrong with the fudge factor (or normalization).

If the gradients in this case are consistent, then the dominant source of error is most likely the modal dispersion (for which we have an "easy" solution).

oskooi commented 1 year ago

but are you sure you ran it long enough?

It turns out that a fixed runtime of 120 was indeed too short for the smallest source bandwidths Δf = 0.05fcen shown previously. The results were not yet converged. Nevertheless, increasing the fixed runtime from 120 to 500 (for which the results are converged) for the single-frequency adjoint gradient calculations does not affect the variation of the accuracy of the adjoint gradient with the source bandwidth.

As an additional demonstration, here are the results for the waveguide mode converter test problem using a decay_by=1e-13 in the OptimizationProblem class constructor (the default is 1e-11). In this example, the source bandwidth Δfsrc is restricted to the range of [0.05, 0.40] fcen which is smaller than the range [0.05, 0.95] fcen of the results shown previously.

What is interesting is that the variation of the error only becomes apparent at the largest resolutions (60 and 120). At the smallest resolution of 30 (which is what we are using in test_adjoint_solver.py), the error seems to be independent of the source bandwidth. This could be why we have not noticed this effect until now.

Two main items to note in these results:

  1. The error in the adjoint gradient of the transmission monitor (second figure) varies by nearly two orders of magnitude at a resolution of 60 and even more at a resolution of 120.
  2. Reducing the source bandwidth does not necessarily produce the highest accuracy (contrary to what I had previously claimed).

adjgrad_err_vs_source_bandwidth_refl

adjgrad_err_vs_source_bandwidth_tran

oskooi commented 1 year ago

edit (11/28): more details on test setup and additional results.

Okay one way to rule out the fudge factor issue: just swap out the mode monitors for dft monitors. If you see the same gradient discrepancy across a broad bandwidth, then you know that there is something else wrong with the fudge factor (or normalization).

Good suggestion, @smartalecH. I tried this out and it seems that the relative error in the broadband adjoint gradient of a DFT fields monitor objective function does increase with distance from the center frequency of the pulsed source. In the results shown in the figure, the center wavelength of the pulsed source is 1.55 μm for which the error is a minimum. As a check, the results are consistent at three different resolutions.

We already know that there is an error in the broadband adjoint gradients due to the eigenmode source (#2324). Based on these results, it seems there is an additional error from the fudge factor.

norm_adjoint_gradient_dft

    def test_multifreq_monitor(self):
    """Verifies that the individual adjoint gradients from a multifrequency                                          
        DFT fields monitor are equivalent to a single-frequency monitor."""
        print("*** TESTING MULTIFREQUENCY MONITOR ***")

        nfrq = 11
        frqs = np.linspace(
            self.fcen - 0.2 * self.df,
            self.fcen + 0.2 * self.df,
            nfrq,
        )
        multifrq_val, multifrq_grad = self.adjoint_solver(
            self.p,
            MonitorObject.DFT,
            frqs,
        )

    for n in range(nfrq):
            frq = frqs[n]
            self.fcen = frq
            self.df = 0.05*self.fcen
            self.mode_source = [
                mp.EigenModeSource(
                    src=mp.GaussianSource(self.fcen, fwidth=self.df),
                    center=mp.Vector3(-0.5 * self.sxy + self.dpml, 0),
                    size=mp.Vector3(0, self.sxy - 2 * self.dpml),
                    eig_parity=self.eig_parity,
        )
            ]
            singlefrq_val, singlefrq_grad = self.adjoint_solver(
        self.p,
        MonitorObject.DFT,
        [frq],
            )
            rel_err = np.abs(singlefrq_val[0] - multifrq_val[n]) / singlefrq_val[0]
            print(f"objfunc:, {singlefrq_val[0]:.8f}, {multifrq_val[n]:.8f}, {rel_err:.6f}")
            rel_err = np.linalg.norm(singlefrq_grad - multifrq_grad[:, n])
            print(f"grad:, {frq:.5f}, {rel_err:.10f}")

For comparison, here are results for the same test but using an eigenmode monitor objective function. The same trend of increasing error from the center frequency of the source is apparent.

norm_adjoint_gradient_eigmode

    def test_multifreq_monitor(self):
        """Verifies that the individual adjoint gradients from a multifrequency                                            
        eigenmode monitor are equivalent to a single-frequency monitor."""
        print("*** TESTING MULTIFREQUENCY MONITOR ***")

        nfrq = 11
        frqs = np.linspace(
            self.fcen - 0.2 * self.df,
            self.fcen + 0.2 * self.df,
            nfrq,
        )
        multifrq_val, multifrq_grad = self.adjoint_solver(
            self.p,
            MonitorObject.EIGENMODE,
            frqs,
        )

        for n in range(nfrq):
            frq = frqs[n]
            self.fcen = frq
            self.df = 0.05*self.fcen
            self.mode_source = [
                mp.EigenModeSource(
                    src=mp.GaussianSource(self.fcen, fwidth=self.df),
                    center=mp.Vector3(-0.5 * self.sxy + self.dpml, 0),
                    size=mp.Vector3(0, self.sxy - 2 * self.dpml),
                    eig_parity=self.eig_parity,
                )
            ]
            singlefrq_val, singlefrq_grad = self.adjoint_solver(
                self.p,
                MonitorObject.EIGENMODE,
                [frq],
            )
            rel_err = np.abs(singlefrq_val[0] - multifrq_val[n]) / singlefrq_val[0]
            print(f"objfunc:, {singlefrq_val[0]:.8f}, {multifrq_val[n]:.8f}, {rel_err:.6f}")
            rel_err = np.linalg.norm(singlefrq_grad - multifrq_grad[:, n])
            print(f"grad:, {frq:.5f}, {rel_err:.10f}")

The function adjoint_solver has been modified such that the objective function is a dimensionless quantity involving normalization of the DFT fields or mode coefficient by the input flux from the eigenmode source:

        # Compute the incident fields of the mode source                                                                   
        # in the straight waveguide for use as normalization                                                               
        # of the reflectance (S11) measurement.                                                                            
        ref_sim = mp.Simulation(
            resolution=self.resolution,
            cell_size=self.cell_size,
            boundary_layers=self.pml_xy,
            sources=self.mode_source,
            geometry=self.waveguide_geometry,
        )
        dft_mon = ref_sim.add_mode_monitor(
            frequencies,
            mp.ModeRegion(
                center=mp.Vector3(0.5 * self.sxy - self.dpml),
                size=mp.Vector3(0, self.sxy - 2 * self.dpml, 0),
            ),
            yee_grid=True,
        )
        ref_sim.run(until_after_sources=20)
        input_flux = np.array(mp.get_fluxes(dft_mon))

        if mon_type.name == "EIGENMODE":
            obj_list = [
                mpa.EigenmodeCoefficient(
                    sim,
                    mp.Volume(
                        center=mp.Vector3(0.5 * self.sxy - self.dpml),
                        size=mp.Vector3(0, self.sxy - 2 * self.dpml, 0),
                    ),
                    1,
                    eig_parity=self.eig_parity,
                )
            ]

            def J(tran_mon):
                return npa.power(npa.abs(tran_mon), 2) / input_flux

        elif mon_type.name == "DFT":
            obj_list = [
                mpa.FourierFields(
                    sim,
                    mp.Volume(center=mp.Vector3(1.25), size=mp.Vector3(0.25, 1, 0)),
                    self.src_cmpt,
                )
            ]

            def J(mode_mon):
                return npa.power(npa.abs(mode_mon[:, 4, 10]), 2) / input_flux
stevengj commented 1 year ago

I think looking as a function of resolution may be a red herring here — the narrow-band (single-frequency) and broadband gradient calculations should match even at a low resolution (i.e. regardless of whether we are modeling Maxwell's equations or a "discrete wave" system), since it is a consequence of linearity alone. (The only things that should spoil it are truncation due to the finite runtime and floating-point roundoff, so the match should improve with runtime until it hits the floating-point limit.)

Be sure to compute the relative error in the gradients, by the way, e.g. ‖narrow - broad‖₂ / ‖narrow‖₂.