NanoComp / meep

free finite-difference time-domain (FDTD) software for electromagnetic simulations
GNU General Public License v2.0
1.19k stars 607 forks source link

Inconsistent finite-difference and adjoint gradients for a simple 3D example #2578

Open rafael-fuente opened 1 year ago

rafael-fuente commented 1 year ago

I'm testing a simple 3D optimization setup, shown in the image below, consisting of a source plane, a design layer, and a monitor plane whose objective function is simply the flux of one polarization. The setup is periodic in the X and Y axes, and the Z axis contains two PMLs placed at the top and bottom boundaries.

Screenshot 2023-07-12 024227

Meep version I'm using is 1.27.0 (parallel), but I get similar results with older versions.

For simplicity, I set up the design region resolution exactly equal to the FDTD resolution, which I randomly initialize and then apply a Gaussian filter to smooth it. However, I'm unable to get consistent finite-difference and adjoint gradients no matter what is the simulation resolution or the run time.

I attached the code used below:

import numpy as np
import meep as mp
import scipy
import matplotlib.pyplot as plt
import meep.adjoint as mpa
from autograd import numpy as npa
from scipy.ndimage import gaussian_filter

resolution  = 20

sx = 2.0
sy = 2.0

d_pml = 0.5
thick_air = 0.5
thick_active = 0.5

mat_active = mp.Medium(index=1.4)
mat_air = mp.Medium(index=1.0)

pml_layers = [mp.PML(thickness=d_pml,direction=mp.Z)]
k_boundary = mp.Vector3(0, 0,  0)

sz = d_pml+thick_air+thick_active+thick_air+d_pml
cell_size = mp.Vector3(sx,sy,sz)

# set up sources
wvl_cen = 0.940
frq_cen = 1/wvl_cen
dfrq = 0.2*frq_cen
source_center = mp.Vector3(0.0,0,-0.5*sz+0.5*thick_air+d_pml)
source_size = mp.Vector3(sx,sy,0)
sources = [mp.Source(mp.GaussianSource(frq_cen, fwidth=dfrq), component=mp.Ex,  amplitude=1.0, center=source_center, size=source_size)]

# set up a reference sim to get the FDTD resolution
sim_ref = mp.Simulation(resolution=resolution,
                    cell_size=cell_size,
                    boundary_layers=pml_layers,
                    k_point=k_boundary,
                    default_material=mat_air,
                    sources=sources)
sim_ref.init_sim()
design_center=mp.Vector3(0,0,-0.5*sz+d_pml+thick_air+0.5*thick_active)
design_size=mp.Vector3(sx,sy,thick_active)

# set design resolution exactly the same as FDTD resolution
Nax, Nay, Naz = sim_ref.get_array(center=design_center, size=design_size, component=mp.Dielectric)[1:-1,1:-1,:].shape

#set up design region (active layer). Randomly initialized + gaussian filter
np.random.seed(0) 
active_layer3D = gaussian_filter(np.random.rand(Nax, Nay, Naz),  1.0, mode='nearest', cval=0).ravel()

designVariable = mp.MaterialGrid(mp.Vector3(Nax,Nay,Naz), mat_air, mat_active, grid_type="U_DEFAULT")
designRegion = mpa.DesignRegion(designVariable,volume=mp.Volume(center=mp.Vector3(0,0,-0.5*sz+d_pml+thick_air+0.5*thick_active), size=mp.Vector3(sx,sy,thick_active)))
designRegions = [designRegion]
active_layer3D_list = [active_layer3D]

geometry = [mp.Block(material=designVariable, size=designRegion.size, center=designRegion.center)]

# set up main simulation

sim = mp.Simulation(resolution=resolution,
                    cell_size=cell_size,
                    boundary_layers=pml_layers,
                    geometry=geometry,
                    k_point=k_boundary,
                    default_material=mat_air,
                    sources=sources,
                    eps_averaging=False,  # turn off/on subpixel averaging
                    )

monitorPosition = mp.Vector3(0,0,-0.5*sz+d_pml+thick_air+thick_active+2/3*thick_air)
Ex_monitor = mpa.FourierFields(sim,volume=mp.Volume(center=monitorPosition,size=mp.Vector3(sx,sy,0)), component = mp.Ex)
Hy_monitor = mpa.FourierFields(sim,volume=mp.Volume(center=monitorPosition,size=mp.Vector3(sx,sy,0)), component = mp.Hy)
ob_list = [Ex_monitor, Hy_monitor]

#simple objetive function (flux of a single polarization)
def J(Ex, Hy):
    obj = npa.sum(npa.real(npa.multiply(Ex[0][1:-1,1:-1],npa.conj(Hy[0][1:-1,1:-1]))))
    return obj

opt = mpa.OptimizationProblem(
    simulation=sim,
    objective_functions=J,
    objective_arguments=ob_list,
    design_regions=designRegions,
    frequencies=np.array([frq_cen]),
    minimum_run_time = 300,
    maximum_run_time = 300,
    nf=1,
)

opt.update_design(active_layer3D_list)
f0, dJ_du = opt()

# plot a slice of the gradient
fig = plt.figure(figsize = (8,5))
ax  = fig.add_subplot(1,1,1)
plt.imshow(dJ_du.reshape(Nax,Nay,Naz)[:,:,Naz//2],  vmin = 0,  vmax = np.abs(dJ_du).max())
fig.savefig(f'adjoint_gradient.png',  dpi=fig.dpi, transparent = False, facecolor='white')

#compute finite-difference gradients
db,choose = 1e-4,5
g_discrete, idx = opt.calculate_fd_gradient(num_gradients=choose,db=db)
g_discrete = np.array(g_discrete).flatten()

from scipy import stats
g_adjoint = dJ_du.flatten()
res = stats.linregress(g_discrete,g_adjoint[idx])

print("Randomly selected indices: ",idx)
print("Finite-difference gradient: ",g_discrete)
print("Adjoint gradient: ",g_adjoint[idx])

#plot the Finite-difference-adjoint-comparison
fig = plt.figure(figsize = (8,5))
ax  = fig.add_subplot(1,1,1)
ax.plot(g_discrete,g_adjoint[idx],"ro")
ax.plot([g_discrete.min(), g_discrete.max()],    [res.intercept.min() + res.slope*g_discrete.min(), res.intercept.max() + res.slope*g_discrete.max()]  , '--', label='fitted line')
ax.set_xlabel("Finite-difference gradient")
ax.set_ylabel("Adjoint gradient")
fig.savefig(f'Finite-difference-adjoint-comparison.png',  dpi=fig.dpi, transparent = False, facecolor='white')

Running this script yields gradients like the following plot:

Finite-difference-adjoint-comparison

Raising the resolution to 40, and changing the value of the gaussian filter to apply to the initial design from 1.0 to 2.0, still produces inconsistent gradients:

Finite-difference-adjoint-comparison

smartalecH commented 1 year ago

Ya adjoint gradients with periodic boundaries are a bit hairy because it's hard for the fields to converge. I know you already ran it for a really long time, but here are a few suggestions.

Try swapping your PML out for absorbers. Maybe make them a bit larger (do some convergence checks) and remove the maximum_runtime limit. The fields should decay away reasonably well with the absorbers (no need to "force" the simulation to terminate).

Try playing with the finite difference step size. Make sure you have enough samples, etc.

Typically, we opt for a directional derivative, rather than several different samples to compare to your gradient (I know our example notebook does the linear regression thing but it's more for show...). All of the adjoint tests do this, so you can look at those for an example.

Maybe try in 2D first. Make sure you get decent agreement there. Then project out into that third dimension. When ready.

Also, you can try a single frequency. That may make things easier for now (or at least identify a bug).

smartalecH commented 1 year ago

@mochen4 any other suggestions?

mochen4 commented 1 year ago

We tried some optimization in 3D with a similar setup (periodic boundary condition and mpa.FourierFields of mp.Ex), and had decent results, so I believe the adjoint gradients should be correct. As far I can recall, we also explicitly checked the gradients in 3D. So there shouldn't be any issue.

Regarding the parameters in your test script, I would use a larger resolution (30), smaller finite difference size (~1e-7), and longer maximum_run_time (500 or 1000). You can also try absorber or 2D or directional derivative as Alec suggested, but personally I think it should still work without those changes.

Another thing I would try is a different objective function that involves only Ex or Hy fields, maybe even only at one single point. As I said, there really shouldn't be any issue.

rafael-fuente commented 1 year ago

@mochen4 @smartalecH I extensively studied this problem by running it on a computer cluster. I've used resolutions as large as 100, runtimes of 1500, PMLs and spacing size = 2.0, and adjoint gradients are still inconsistent with finite-difference gradients. (and also, compared with frequency domain solvers with direct differentiability modeling the same setup)

When trying to do an optimization, optimization algorithms usually get stuck in many steps; no matter how small, the step doesn't lower the loss value.

However, I tried the same setup in 2D, and optimization works fine, and gradients are consistent even for a resolution of 20, so the problem comes exclusively from 3D.

@mochen4 What was the 3D setup you said you got the gradients working?

smartalecH commented 1 year ago

and gradients are consistent even for a resolution of 20

The gradients shouldn't really be sensitive to resolution anymore. We implemented the adjoint for the actual discrete operator awhile back.

Sorry it's giving you so much trouble. Try some of the other suggestions we gave (eg absorber instead of PML, smaller FD step sizes) and let us know how it goes.

mochen4 commented 1 year ago

@mochen4 What was the 3D setup you said you got the gradients working?

Try an objective function of npa.abs(Ex[0, a, b])**2, where a, b are the indices of some specific point in the monitor region (e.g. a=2, b=1).

You can also shrink the size of simulation in each direction to lower the time cost.