improve CW solver convergence properties using preconditioning

oskooi commented 5 years ago

Currently, the frequency-domain solver does not use any preconditioning to improve the convergence properties of its biconjugate gradient iterative method. Thus, there is potential for performance gains which could enable this feature to become more widely used; particularly for cases involving e.g., oblique planewaves which require narrow-band sources (as described in #471).

stevengj commented 5 years ago

@wsshin has done a lot of work on this sort of thing. Preconditioning wave equations (Helmholtz equations) is notoriously hard, and there aren't that many good techniques. However, Wonseok has identified two things that work pretty well: changing the PML (or doing an equivalent field preconditioner) and adding an operator to shift quasistatic eigenvalues. Diagonal (Jacobi) preconditioners, on the other hand, don't work that well. See:

wsshin commented 5 years ago

On top of the two techniques @stevengj mentioned, you can get an additional 2X improvement by making the matrix symmetric in BiCG. Then you can modify the original BiCG algorithm to something that does not require application of Aᵀ (because Aᵀ = A).

Making the operator symmetric is actually not trivial. As far as I remember MEEP supports only the square grid, but the stretched-coordinate PML stretches the edges of the squares along the Cartesian directions inside PML and makes the squares effectively rectangles. On such a rectangular grid, the finite-difference double-curl operator ∇×∇× is no longer symmetric. To make it symmetric, you need to apply special symmetrizing preconditioners.

There are infinitely many symmetrizing preconditioners, but not all of them are good because most of them ill-condition the matrix though symmetrizing it. One way to construct symmetrizing preconditioners that preserve the conditioning is described in https://patents.google.com/patent/US20140207426A1/en.

If the boundary condition is Bloch with nonzero phase differences (to implement oblique incidence), you cannot make the operator symmetric even with this technique.

I have several other ideas to accelerate the finite-difference Maxwell solvers in the frequency domain. I will eventually test and implement them in https://github.com/wsshin/MaxwellFDM.jl, but you are welcome to do so in MEEP. We can discuss these ideas later if you are interested!

oskooi commented 5 years ago

Thanks for the info. Is there an approach for non-symmetric operators that would be relatively simple to implement which we could use for benchmarking?

oskooi commented 5 years ago

One simple approach to enhancing the convergence rate of the biCG solver is to increase the L parameter from its default value of 2 to e.g., 10. This is described in the documentation. However, it seems that large L values lead to instabilities. This is demonstrated in the following example which is adapted from the binary grating tutorial involving an oblique planewave source. The only change involves replacing the time stepping with the CW solver.

import meep as mp
import math
import cmath
import numpy as np

resolution = 50        # pixels/μm                                                                                                                                            

tol = 1e-7             # CW solver tolerance                                                                                                                                  
max_iters = 3000       # CW solver max iterations
L = 10                 # CW solver L                                

dpml = 1.0             # PML thickness                                                                                                                                        
dsub = 1.0             # substrate thickness                                                                                                                                  
dpad = 1.0             # length of padding between grating and pml                                                                                                            
gp = 6.0               # grating period                                                                                                                                       
gh = 0.5               # grating height                                                                                                                                       
gdc = 0.5              # grating duty cycle                                                                                                                                   

sx = dpml+dsub+gh+dpad+dpml
sy = gp

cell_size = mp.Vector3(sx,sy,0)

# replace anisotropic PML with isotropic Absorber to attenuate parallel-directed fields of oblique source                                                                     
abs_layers = [mp.Absorber(thickness=dpml,direction=mp.X)]

wvl = 0.5              # center wavelength                                                                                                                                    
fcen = 1/wvl           # center frequency                                                                                                                                     
df = 0.05*fcen         # frequency width                                                                                                                                      

ng = 1.5
glass = mp.Medium(index=ng)

# rotation angle of incident planewave; CCW about Z axis, 0 degrees along +X axis                                                                                             
theta_in = math.radians(22.9)

# k (in source medium) with correct length (plane of incidence: XY)                                                                                                           
k = mp.Vector3(math.cos(theta_in),math.sin(theta_in),0).scale(fcen*ng)

symmetries = []
eig_parity = mp.ODD_Z
if theta_in == 0:
  k = mp.Vector3(0,0,0)
  symmetries = [mp.Mirror(mp.Y)]
  eig_parity += mp.EVEN_Y

def pw_amp(k,x0):
  def _pw_amp(x):
    return cmath.exp(1j*2*math.pi*k.dot(x+x0))
  return _pw_amp

src_pt = mp.Vector3(-0.5*sx+dpml+0.3*dsub,0,0)
sources = [mp.Source(mp.ContinuousSource(fcen,fwidth=df),
                     component=mp.Ez,
                     center=src_pt,
                     size=mp.Vector3(0,sy,0),
                     amp_func=pw_amp(k,src_pt))]

sim = mp.Simulation(resolution=resolution,
                    cell_size=cell_size,
                    boundary_layers=abs_layers,
                    k_point=k,
                    force_complex_fields=True,
                    default_material=glass,
                    symmetries=symmetries,
                    sources=sources)

refl_pt = mp.Vector3(-0.5*sx+dpml+0.5*dsub,0,0)
refl_flux = sim.add_flux(fcen, 0, 1, mp.FluxRegion(center=refl_pt, size=mp.Vector3(0,sy,0)))

sim.init_sim()
sim.solve_cw(tol, max_iters, L)

input_flux = mp.get_fluxes(refl_flux)
input_flux_data = sim.get_flux_data(refl_flux)

sim.reset_meep()

geometry = [mp.Block(material=glass, size=mp.Vector3(dpml+dsub,mp.inf,mp.inf), center=mp.Vector3(-0.5*sx+0.5*(dpml+dsub),0,0)),
            mp.Block(material=glass, size=mp.Vector3(gh,gdc*gp,mp.inf), center=mp.Vector3(-0.5*sx+dpml+dsub+0.5*gh,0,0))]

sim = mp.Simulation(resolution=resolution,
                    cell_size=cell_size,
                    boundary_layers=abs_layers,
                    geometry=geometry,
                    k_point=k,
                    force_complex_fields=True,
                    symmetries=symmetries,
                    sources=sources)

refl_flux = sim.add_flux(fcen, 0, 1, mp.FluxRegion(center=refl_pt, size=mp.Vector3(0,sy,0)))
sim.load_minus_flux_data(refl_flux,input_flux_data)

tran_pt = mp.Vector3(0.5*sx-dpml-0.5*dpad,0,0)
tran_flux = sim.add_flux(fcen, 0, 1, mp.FluxRegion(center=tran_pt, size=mp.Vector3(0,sy,0)))

sim.init_sim()
sim.solve_cw(tol, max_iters, L)

nm_r = np.floor((fcen*ng-k.y)*gp)-np.ceil((-fcen*ng-k.y)*gp) # number of reflected orders                                                                                     
if theta_in == 0:
  nm_r = nm_r/2 # since eig_parity removes degeneracy in y-direction                                                                                                          
nm_r = int(nm_r)

res = sim.get_eigenmode_coefficients(refl_flux, range(1,nm_r+1), eig_parity=eig_parity)
r_coeffs = res.alpha

Rsum = 0
for nm in range(nm_r):
  r_kdom = res.kdom[nm]
  Rmode = abs(r_coeffs[nm,0,1])**2/input_flux[0]
  r_angle = np.sign(r_kdom.y)*math.acos(r_kdom.x/(ng*fcen))
  print("refl:, {}, {:.2f}, {:.8f}".format(nm,math.degrees(r_angle),Rmode))
  Rsum += Rmode

nm_t = np.floor((fcen-k.y)*gp)-np.ceil((-fcen-k.y)*gp)       # number of transmitted orders                                                                                   
if theta_in == 0:
  nm_t = nm_t/2 # since eig_parity removes degeneracy in y-direction                                                                                                          
nm_t = int(nm_t)

res = sim.get_eigenmode_coefficients(tran_flux, range(1,nm_t+1), eig_parity=eig_parity)
t_coeffs = res.alpha

Tsum = 0
for nm in range(nm_t):
  t_kdom = res.kdom[nm]
  Tmode = abs(t_coeffs[nm,0,0])**2/input_flux[0]
  t_angle = np.sign(t_kdom.y)*math.acos(t_kdom.x/fcen)
  print("tran:, {}, {:.2f}, {:.8f}".format(nm,math.degrees(t_angle),Tmode))
  Tsum += Tmode

print("mode-coeff:, {:.6f}, {:.6f}, {:.6f}".format(Rsum,Tsum,Rsum+Tsum))

r_flux = mp.get_fluxes(refl_flux)
t_flux = mp.get_fluxes(tran_flux)
Rflux = -r_flux[0]/input_flux[0]
Tflux =  t_flux[0]/input_flux[0]
print("poynting-flux:, {:.6f}, {:.6f}, {:.6f}".format(Rflux,Tflux,Rflux+Tflux))

At a resolution of 50, the iterative solver is unstable which leads to a field divergence:

resolution=50, L=10

...
tran:, 21, -66.39, nan
tran:, 22, 66.49, nan
mode-coeff:, nan, nan, nan
poynting-flux:, nan, nan, nan

It is only when the resolution is doubled to 100 that the solver becomes stable:

resolution=100, L=10

...
tran:, 22, 66.49, 0.00905274
mode-coeff:, 0.125614, 0.872006, 0.997620
poynting-flux:, 0.128263, 0.871365, 0.999628
...

The resolution=100 and L=10 case has a runtime on my machine of 2006.7478 s. For comparison, the case with resolution=50 and L=2, which is stable, has a runtime of 38.1021 s (i.e., more than 50 times faster):

...
tran:, 22, 66.49, 0.00755522
mode-coeff:, 0.107293, 0.878687, 0.985980
poynting-flux:, 0.122289, 0.877775, 1.000064

Thus, it seems that increasing the L parameter is not a practical strategy for enhancing the convergence rate since it also requires increasing the resolution to preserve stability and thereby the runtimes.

This may indicate that pursuing alternative strategies for improving the CW solver's stability in addition to its convergence rate, using preconditioning or other methods, would also be important for enabling this feature to be more widely used.

oskooi commented 5 years ago

For ω=0 (or alternatively λ=∞) which is necessary for electrostatic calculations, the CW solver fails to converge. An infinitesimally small ω, as a potential workaround, has poor convergence. A suitable preconditioner for this case could therefore also be useful.

stevengj commented 5 years ago

For ω=0 the matrix is singular, so more care is needed to get a solution at all, or a unique solution.

wsshin commented 5 years ago

Electrostatics is a vastly different problem. For example, PML doesn't work for electrostatics. As far as I know, simulating an unbounded domain is therefore challenging for electrostatics.

NanoComp / meep

improve CW solver convergence properties using preconditioning #548