Closed mawc2019 closed 2 years ago
Are you using your version of the DFT fields that uses the same IndexSrc
as the FarFields adjoint source? I'm guessing the issue lies there, since I've never seen this with Eigenmode adjoint sources (and I run with multiple groups quite regularly).
I was using two different versions of the FourierFields adjoint solver. One of them is the version in the master branch that does not use the IndexedSource
, and the other is my modified version that uses IndexedSource
. This problem occurs in both versions.
One of them is the version in the master branch that does not use the IndexedSource
Hmm that's odd. Maybe try two things:
First, swap out the FourierFields quantity for an Eigenmode quantity. If you still get a discrepancy, then it's probably not related to the adjoint solver itself.
Also, can you post a MWE (I think you posted some code on a different issue with a similar problem)? It would be interesting to see if you are doing any rank dependent processing (e.g. with am_master()
).
I was using two different versions of the Near2FarFields adjoint solver. One of them is the version in the master branch that does not use the IndexedSource
Just reread this. Are you sure you are using the master branch for Near2Far
adjoints? The master branch does use the IndexedSource
. Did you mean to say you have a custom branch that somehow does not use an IndexedSource
, and it still fails?
I am sorry that it was a typo. What I meant was FourierFields
. It is corrected now.
Although I mention this problem in my reply in #1676, the issue there seems to be unrelated.
Hi @smartalecH , this problem also occurs in EigenmodeCoefficient
. A small example is as follows. (I have revised my previous posts in this issue in case they are misleading.)
import meep as mp
import meep.adjoint as mpa
import numpy as np
from autograd import numpy as npa
Si = mp.Medium(index=3.48)
SiO2 = mp.Medium(index=1.45)
Air = mp.Medium(index=1)
num_groups = 4
n = mp.divide_parallel_processes(num_groups)
resolution,design_resolution = 20,20
design_region_width,design_region_height = 2.0,0.2
Lx,Ly = design_region_width+0.0,4
dpml = 1.5
monitor_height,monitor_width = design_region_height/2,Lx
pml_layers = [mp.PML(dpml, direction=mp.Y)]
cell_size = mp.Vector3(Lx,Ly+2*dpml)
fcen,fwidth = 1/1.55,0.01/1.55
k = mp.Vector3(fcen*1.45).rotate(mp.Vector3(z=1), np.radians(100))
source_center,source_size = mp.Vector3(0,-Ly/4),mp.Vector3(Lx,0)
src = mp.GaussianSource(frequency=fcen,fwidth=fwidth)
source = [mp.EigenModeSource(src,eig_band = 1,direction=mp.NO_DIRECTION,eig_kpoint=k,size = source_size,center=source_center)]
Nx,Ny = int(round(design_resolution*design_region_width)),1
design_variables = mp.MaterialGrid(mp.Vector3(Nx,Ny),Air,Si,grid_type='U_MEAN')
design_region = mpa.DesignRegion(design_variables,volume=mp.Volume(center=mp.Vector3(0,design_region_height/2),size=mp.Vector3(design_region_width,design_region_height)))
geometry = [mp.Block(center=mp.Vector3(0,-Ly/4-dpml/2), material=SiO2, size=mp.Vector3(Lx,Ly/2+dpml, 0)), # subtrate
mp.Block(center=design_region.center, size=design_region.size, material=design_variables)] # design region
sim = mp.Simulation(
cell_size=cell_size,boundary_layers=pml_layers,geometry=geometry,sources=source,k_point=k,eps_averaging=False,resolution=resolution)
TE0 = mpa.EigenmodeCoefficient(sim,mp.Volume(center=mp.Vector3(0,monitor_height),size=mp.Vector3(monitor_width,0)),mode=1)
ob_list = [TE0]
def J(alpha):
return npa.abs(alpha)**2
opt = mpa.OptimizationProblem(
simulation = sim,objective_functions = J,objective_arguments = ob_list,design_regions = [design_region],
fcen=fcen,df = 0,nf = 1,decay_fields=[mp.Ez],maximum_run_time = 2000)
x = 0.5*np.ones(Nx*Ny)
f_single, dJ_single = opt([x])
f_multiple = mp.merge_subgroup_data(f_single)
dJ_multiple = mp.merge_subgroup_data(dJ_single)
np.savetxt("emc_value_0819.out",f_multiple) # save the function values for comparison
np.savetxt("emc_grad_0819.out",dJ_multiple) # save the gradients for comparison
When I run the code multiple times, the function values f_multiple
may not remain the same every time. In addition, all entries of f_multiple
should be identical but sometimes they are not.
Likewise, the gradients dJ_multiple
may not remain the same every time. All columns of dJ_multiple
should be identical but sometimes they are not.
The indeterminacy is not caused by the adjoint solver, because it also happens in simulation without the adjoint solver. An example similar to the previous one is as follows.
import meep as mp
import numpy as np
num_groups = 4
n = mp.divide_parallel_processes(num_groups)
resolution = 20
Lx,Ly = 2,4
dpml = 1.5
monitor_height,monitor_width = 0.1,Lx
pml_layers = [mp.PML(dpml, direction=mp.Y)]
cell_size = mp.Vector3(Lx,Ly+2*dpml)
fcen,fwidth = 1/1.55,0.01/1.55
theta_in = np.radians([90,100,110,120])
k = mp.Vector3(fcen*1.45).rotate(mp.Vector3(z=1), theta_in[n])
source_center,source_size = mp.Vector3(0,-Ly/4),mp.Vector3(Lx,0)
src = mp.GaussianSource(frequency=fcen,fwidth=fwidth)
source = [mp.EigenModeSource(src,eig_band = 1,direction=mp.NO_DIRECTION,eig_kpoint=k,size = source_size,center=source_center)]
geometry = [mp.Block(center=mp.Vector3(), material=mp.Medium(index=1), size=mp.Vector3(Lx,Ly+2*dpml, 0))]
sim = mp.Simulation(
cell_size=cell_size,boundary_layers=pml_layers,geometry=geometry,sources=source,k_point=k,eps_averaging=False,resolution=resolution)
mode_monitor = sim.add_mode_monitor(fcen,0,1,mp.ModeRegion(center=mp.Vector3(0,monitor_height),size=mp.Vector3(monitor_width,0)),yee_grid=True)
sim.run(until_after_sources=100)
emc = sim.get_eigenmode_coefficients(mode_monitor, [1], direction=mp.AUTOMATIC)[0]
emc_abs = np.abs(mp.merge_subgroup_data(emc)).flatten()
np.savetxt("emc_test_0820.out",emc_abs)
Looks like your mode source has a different k_point
on each processor group, right?
k = mp.Vector3(fcen*1.45).rotate(mp.Vector3(z=1), theta_in[n])
That will launch a different mode source in each sim, which should explain the discrepancy.
But if I run your adjoint code enough times, I do start to see some discrepancies in f_multiple
among the processor groups, although it's pretty small:
[98319.49482893 98319.49676238 98319.49676238 98319.49676238]
Interestingly, it's the am_really_master
group that is slightly different. Not sure where the discrepancy could be coming from (especially since the first few times I run it, the results are identical -- perhaps there's some weird caching going on under the hood that is not initially broadcasted to all the groups?)
cc @stevengj
Looks like your mode source has a different
k_point
on each processor group, right?
Yes, this is the case in the second example, and the results for different k_point
are not expected to be the same.
I also tried to use the same k_point
for each parallel process, just like that in the first example. But I did not see varying results in multiple runs. (The difference might appear if I ran the code more times.) Therefore, I used different k_point
for each parallel process in the second example.
Yes, this is the case in the second example, and the results for different k_point are not expected to be the same. I also tried to use the same k_point for each parallel process, just like that in the first example. But I did not see varying results in multiple runs. (The difference might appear if I ran the code more times.) Therefore, I used different k_point for each parallel process in the second example.
I'm a bit confused. Doesn't this mean that the divide_parallel_processes()
function is correctly working, since you were unable to see a discrepancy outside of the adjoint solver? In other words, the discrepancy really is inside the adjoint solver itself?
Doesn't this mean that the divide_parallel_processes() function is correctly working,
Up to now, I have only seen the indeterminacy when divide_parallel_processes()
is used. I tried to illustrate the indeterminacy without divide_parallel_processes()
, but it did not occur. Even so, it does not mean that the indeterminacy is rooted in divide_parallel_processes()
.
since you were unable to see a discrepancy outside of the adjoint solver?
The second example shows that the indeterminacy also happens when the adjoint solver is not used.
In other words, the discrepancy really is inside the adjoint solver itself?
As the second example shows, the indeterminacy also happens when the adjoint solver is not used, so I think this issue is not caused by the adjoint solver. However, when optimization (using the adjoint solver) is performed, the effect of the tiny discrepancy may sometimes accumulate and become conspicuous. In other words, optimization (using the adjoint solver) may be more susceptible to the indeterminacy compared with conventional simulation.
I just noticed that the MPB eigensolver can re-initialize with pseudo-random numbers under some circumstances: https://github.com/NanoComp/mpb/blob/af31eb16e26e3e213b49c89bf68cd512fb29e18c/src/matrices/eigensolver.c#L342-L343
This could cause slight variations between runs. Maybe we could try replacing that with something deterministic? (This affects both eigenmode sources and eigenmode coefficients, but won't affect you if you aren't using eigenmodes.)
I replace the pseudo-random numbers with determinate numbers, but the problem of indeterminacy remains. Moreover, the indeterminacy also happens when Near2FarFields or FourierFields adjoint solvers are used without eigenmode sources.
In previous examples, the indeterminacy only causes small variations among different runs. However, large variations may also appear. In the example below, the adjoint gradient g_adjoint
can vary from normal values to enormous numbers or nan
. Those abnormal values of g_adjoint
might share the same origin as #1676.
In addition, in previous examples, the indeterminacy is observed only when divide_parallel_processes
is used, but at least in the current version, the indeterminacy could happen without using divide_parallel_processes
, as also shown in the example below.
import meep as mp
import meep.adjoint as mpa
import numpy as np
from autograd import numpy as npa
Si, SiO2 = mp.Medium(index=3.4), mp.Medium(index=1.44)
resolution,design_resolution = 20,20
Lx,Ly,pml_size = 2,4,1
cell_size = mp.Vector3(Lx,Ly+2*pml_size)
pml_layers = [mp.PML(pml_size, direction=mp.Y)]
design_width,design_height = 2,0.4
fcen,fwidth = 1/1.55,0.01/1.55
source_center,source_size = mp.Vector3(0,-Ly/4),mp.Vector3(Lx,0)
src = mp.GaussianSource(frequency=fcen,fwidth=fwidth)
source = [mp.Source(src,component=mp.Ez,center=source_center,size=source_size)]
Nx,Ny = int(round(design_resolution*design_width)),int(round(design_resolution*design_height))
design_variables = mp.MaterialGrid(mp.Vector3(Nx,Ny),SiO2,Si,grid_type='U_MEAN')
design_region = mpa.DesignRegion(design_variables,volume=mp.Volume(center=mp.Vector3(), size=mp.Vector3(design_width, design_height)))
geometry = [mp.Block(center=design_region.center, size=design_region.size, material=design_variables)] # design region
sim = mp.Simulation(
cell_size=cell_size,boundary_layers=pml_layers,geometry=geometry,
sources=source,default_material=SiO2,resolution=resolution)
far_pt = [mp.Vector3(0,20,0)]
NearRegions = [mp.Near2FarRegion(center=mp.Vector3(0,design_height+0.2), size=mp.Vector3(Lx,0), weight=+1)]
FarFields = mpa.Near2FarFields(sim, NearRegions,far_pt)
ob_list = [FarFields]
def J(alpha):
return npa.abs(alpha[0,0,2])**2
opt = mpa.OptimizationProblem(
simulation = sim,objective_functions = J,objective_arguments = ob_list,
design_regions = [design_region],fcen=fcen,df = 0,nf = 1,decay_by = 1e-5)
x0 = 0.5*np.ones(Nx*Ny)
opt.update_design([x0])
f0, g_adjoint = opt()
np.random.seed(1905)
g_discrete, idx = opt.calculate_fd_gradient(num_gradients=5,db=1e-4)
g_discrete = np.array(g_discrete).flatten()
print("Randomly selected indices: ",idx)
print("Finite-difference gradient: ",g_discrete)
print("Adjoint gradient: ",g_adjoint[idx])
It seems like the next step is to carefully trace where in the gradient calculation the results start to diverge between groups. Since you have a test case where the results are so different, the process should be a lot simpler (or at least our error should be more obvious than before). Here are the key places in the code I would check:
My guess is that it's independent of the adjoint solver, so hopefully we track the issue in step 1. But if not, we have a way to carefully see where things are going wrong.
In this example, the forward run does not produce abnormal values, and the results of the forward run do not change even when the results of the adjoint run change violently.
I checked the scaling for the adjoint sources via printing the results at here. These results do not change, either.
The problem appears in the adjoint fields. I saw the indeterminacy and abnormal values when printing the results of get_dft_array
at here.
My guess is that it's independent of the adjoint solver
Probably, as shown in the example on August 20th.
Hmm so the adjoint scaling is consistent, but the fields that are recorded are different.
So now we need to see if the adjoint sources themselves are identical between groups, or if there is a bug in the dft monitors between groups.
As a quick test, maybe take a snapshot of the adjoint simulation (with the fields plotted) on each group a few seconds after the simulation starts. Hopefully, we see a big difference. plot2D()
will plot the fields/simulation domain for each group:
opt.prepare_forward_run()
opt.forward_run()
opt.prepare_adjoint_run()
opt.sim.reset_meep()
opt.sim.change_sources(opt.adjoint_sources[0])
opt.sim.run(until=5)
if mp.am_master():
plt.figure()
opt.sim.plot2D(fields=mp.Hz)
ptl.savefig("fields_{}.png".format(my_group))
plt.show()
Before plotting figures, I changed fields=mp.Hz
to fields=mp.Ez
in order to be consistent with our setup in the above example.
When running opt.sim.plot2D(fields=mp.Ez)
, I got
File "/home/gridsan/mawc/install/lib/python3.7/site-packages/meep/simulation.py", line 4151, in plot2D
**kwargs)
File "/home/gridsan/mawc/install/lib/python3.7/site-packages/meep/visualization.py", line 564, in plot2D
ax = plot_sources(sim,ax,output_plane=output_plane,labels=labels,source_parameters=source_parameters)
File "/home/gridsan/mawc/install/lib/python3.7/site-packages/meep/visualization.py", line 442, in plot_sources
vol = Volume(center=src.center,size=src.size)
AttributeError: 'IndexedSource' object has no attribute 'center'
So I canceled plot_sources
in visualization.py.
Then I ran the code multiple times, but the figures did not change in different runs.
I also changed opt.sim.run(until=5)
to opt.sim.run(until=500)
. The situation remained the same.
Okay as a true check, we should pull the raw time-domain fields (at any arbitrary time) and make sure they are consistent across all groups.
But it seems that something is going on with the recording of the fields themselves (within the DFT monitors).
Any updates on this?
I have not seen this problem again when running this example on the current master branch.
In previous examples, the indeterminacy only causes small variations among different runs. However, large variations may also appear. In the example below, the adjoint gradient
g_adjoint
can vary from normal values to enormous numbers ornan
. Those abnormal values ofg_adjoint
might share the same origin as #1676. In addition, in previous examples, the indeterminacy is observed only whendivide_parallel_processes
is used, but at least in the current version, the indeterminacy could happen without usingdivide_parallel_processes
, as also shown in the example below.import meep as mp import meep.adjoint as mpa import numpy as np from autograd import numpy as npa Si, SiO2 = mp.Medium(index=3.4), mp.Medium(index=1.44) resolution,design_resolution = 20,20 Lx,Ly,pml_size = 2,4,1 cell_size = mp.Vector3(Lx,Ly+2*pml_size) pml_layers = [mp.PML(pml_size, direction=mp.Y)] design_width,design_height = 2,0.4 fcen,fwidth = 1/1.55,0.01/1.55 source_center,source_size = mp.Vector3(0,-Ly/4),mp.Vector3(Lx,0) src = mp.GaussianSource(frequency=fcen,fwidth=fwidth) source = [mp.Source(src,component=mp.Ez,center=source_center,size=source_size)] Nx,Ny = int(round(design_resolution*design_width)),int(round(design_resolution*design_height)) design_variables = mp.MaterialGrid(mp.Vector3(Nx,Ny),SiO2,Si,grid_type='U_MEAN') design_region = mpa.DesignRegion(design_variables,volume=mp.Volume(center=mp.Vector3(), size=mp.Vector3(design_width, design_height))) geometry = [mp.Block(center=design_region.center, size=design_region.size, material=design_variables)] # design region sim = mp.Simulation( cell_size=cell_size,boundary_layers=pml_layers,geometry=geometry, sources=source,default_material=SiO2,resolution=resolution) far_pt = [mp.Vector3(0,20,0)] NearRegions = [mp.Near2FarRegion(center=mp.Vector3(0,design_height+0.2), size=mp.Vector3(Lx,0), weight=+1)] FarFields = mpa.Near2FarFields(sim, NearRegions,far_pt) ob_list = [FarFields] def J(alpha): return npa.abs(alpha[0,0,2])**2 opt = mpa.OptimizationProblem( simulation = sim,objective_functions = J,objective_arguments = ob_list, design_regions = [design_region],fcen=fcen,df = 0,nf = 1,decay_by = 1e-5) x0 = 0.5*np.ones(Nx*Ny) opt.update_design([x0]) f0, g_adjoint = opt() np.random.seed(1905) g_discrete, idx = opt.calculate_fd_gradient(num_gradients=5,db=1e-4) g_discrete = np.array(g_discrete).flatten() print("Randomly selected indices: ",idx) print("Finite-difference gradient: ",g_discrete) print("Adjoint gradient: ",g_adjoint[idx])
When using the Near2FarFields or FourierFields adjoint solver with
divide_parallel_processes
, I get different results in different runs. The differences may be obvious after a number of steps during optimization.