Optimization of multiple objectives

aj-adkins commented 1 year ago

I am trying to learn how to work with optimizing multiple FOMs. As a basic test I just modified the splitter tutorial example to be non-symmetric and have two objective functions, but it does not work. I saw an issue posted a number of months ago about multiple objectives not working, but has this been addressed?

For context, my ultimate goal is to optimize a 2x2 directional coupler with an uneven splitting ratio (if this is possible) which I believe would require 4 sources and 4 FOMs to truly optimize. Any help is appreciated if anyone has insight into this. Thanks

I am using Python 3.10 if that has any effect.

smartalecH commented 1 year ago

be non-symmetric and have two objective functions, but it does not work

Can you be more specific? Is it crashing? Or is it just not optimizing? Have you done some basic debugging, like printing out both FOMs at each iteration? Are you saving images of the gradients and design fields?

aj-adkins commented 1 year ago

Sorry for the lack of specifics. I tried debugging steps like you mentioned on my own design without much success, so I decided to just try adding two objectives to the given example as a basic test to understand the functionality. I kept everything the same in the splitter example but removed the symmetry and split the two output ports into two objective functions like so:

def J1(source, top, bottom):
    power = npa.abs(top / source) ** 2
    return npa.mean(power)

def J2(source, top, bottom):
    power = npa.abs(bottom / source) ** 2
    return npa.mean(power)

opt = mpa.OptimizationProblem(
    simulation=sim,
    objective_functions=[J1, J2],
    objective_arguments=ob_list,
    design_regions=[design_region],
    frequencies=frequencies,
    decay_by=1e-5,
)

The optimization is crashing after one iteration with the following stack trace:

ValueError                                Traceback (most recent call last)
Cell In[22], line 27
     25 solver.set_max_objective(lambda a, g: f(a, g, cur_beta))
     26 solver.set_maxeval(update_factor)
---> 27 x[:] = solver.optimize(x)
     28 cur_beta = cur_beta * beta_scale

File [~/anaconda3/envs/invdes/lib/python3.10/site-packages/nlopt/nlopt.py:335](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/root/MEEP/~/anaconda3/envs/invdes/lib/python3.10/site-packages/nlopt/nlopt.py:335), in opt.optimize(self, *args)
    334 def optimize(self, *args):
--> 335     return _nlopt.opt_optimize(self, *args)

Cell In[22], line 25, in (a, g)
     23 solver.set_lower_bounds(lb)
     24 solver.set_upper_bounds(ub)
---> 25 solver.set_max_objective(lambda a, g: f(a, g, cur_beta))
     26 solver.set_maxeval(update_factor)
     27 x[:] = solver.optimize(x)

Cell In[21], line 25, in f(v, gradient, cur_beta)
     22 plt.show()
     24 if gradient.size > 0:
---> 25     gradient[:] = tensor_jacobian_product(mapping, 0)(
     26         v, eta_i, cur_beta, np.sum(dJ_du, axis=1)
     27     )
     29 evaluation_history.append(np.max(np.real(f0)))
     31 cur_iter[0] = cur_iter[0] + 1

File [~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/wrap_util.py:20](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/root/MEEP/~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/wrap_util.py:20), in unary_to_nary..nary_operator..nary_f(*args, **kwargs)
     18 else:
     19     x = tuple(args[i] for i in argnum)
---> 20 return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)

File [~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/differential_operators.py:60](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/root/MEEP/~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/differential_operators.py:60), in jacobian(fun, x)
     50 @unary_to_nary
     51 def jacobian(fun, x):
     52     """
     53     Returns a function which computes the Jacobian of `fun` with respect to
     54     positional argument number `argnum`, which must be a scalar or array. Unlike
   (...)
     58     (out1, out2, ...) then the Jacobian has shape (out1, out2, ..., in1, in2, ...).
     59     """
---> 60     vjp, ans = _make_vjp(fun, x)
     61     ans_vspace = vspace(ans)
     62     jacobian_shape = ans_vspace.shape + vspace(x).shape

File [~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/core.py:10](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/root/MEEP/~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/core.py:10), in make_vjp(fun, x)
      8 def make_vjp(fun, x):
      9     start_node = VJPNode.new_root()
---> 10     end_value, end_node =  trace(start_node, fun, x)
     11     if end_node is None:
     12         def vjp(g): return vspace(x).zeros()

File [~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/tracer.py:10](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/root/MEEP/~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/tracer.py:10), in trace(start_node, fun, x)
      8 with trace_stack.new_trace() as t:
      9     start_box = new_box(x, t, start_node)
---> 10     end_box = fun(start_box)
     11     if isbox(end_box) and end_box._trace == start_box._trace:
     12         return end_box._value, end_box._node

File [~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/wrap_util.py:15](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/root/MEEP/~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/wrap_util.py:15), in unary_to_nary..nary_operator..nary_f..unary_f(x)
     13 else:
     14     subargs = subvals(args, zip(argnum, x))
---> 15 return fun(*subargs, **kwargs)

File [~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/differential_operators.py:107](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/root/MEEP/~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/differential_operators.py:107), in tensor_jacobian_product..vector_dot_fun(*args, **kwargs)
    105 def vector_dot_fun(*args, **kwargs):
    106     args, vector = args[:-1], args[-1]
--> 107     return np.tensordot(vector, fun(*args, **kwargs), axes=np.ndim(vector))

File [~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/tracer.py:44](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/root/MEEP/~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/tracer.py:44), in primitive..f_wrapped(*args, **kwargs)
     42 parents = tuple(box._node for _     , box in boxed_args)
     43 argnums = tuple(argnum    for argnum, _   in boxed_args)
---> 44 ans = f_wrapped(*argvals, **kwargs)
     45 node = node_constructor(ans, f_wrapped, argvals, kwargs, argnums, parents)
     46 return new_box(ans, trace, node)

File [~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/tracer.py:48](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/root/MEEP/~/anaconda3/envs/invdes/lib/python3.10/site-packages/autograd/tracer.py:48), in primitive..f_wrapped(*args, **kwargs)
     46     return new_box(ans, trace, node)
     47 else:
---> 48     return f_raw(*args, **kwargs)

File <__array_function__ internals>:200, in tensordot(*args, **kwargs)

File [~/anaconda3/envs/invdes/lib/python3.10/site-packages/numpy/core/numeric.py:1117](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/root/MEEP/~/anaconda3/envs/invdes/lib/python3.10/site-packages/numpy/core/numeric.py:1117), in tensordot(a, b, axes)
   1115             axes_b[k] += ndb
   1116 if not equal:
-> 1117     raise ValueError("shape-mismatch for sum")
   1119 # Move the axes to sum over to the end of "a"
   1120 # and to the front of "b"
   1121 notin = [k for k in range(nda) if k not in axes_a]

ValueError: shape-mismatch for sum

It seems to me to be a problem with the gradient calculation. Maybe I'm just lacking some understanding of the system setup but it seems this should work.

smartalecH commented 1 year ago

Looks like an array shape error.

Since you have two FOMs, it should return a list of the gradient arrays (corresponding to each FOM for all wavelengths).

Print out the FOM and the gradient so you can see how it's stored, then you can properly backprop through your mapping function etc.

aj-adkins commented 1 year ago

Ah ok, silly oversight on my part. I followed the Schubert el. al. example for backpropagation of two gradients like so, although I'm not sure if this is totally correct. I will admit I do not have much of a background in optimization.

nfrq = len(frequencies)
new_grad = np.zeros((Nx * Ny, 2 * nfrq))
new_grad[:, :nfrq] = dJ_du[0]
new_grad[:, nfrq:] = dJ_du[1]

for i in range(2*nfrq):
    new_grad[:, i] = tensor_jacobian_product(mapping, 0)(v, eta_i, cur_beta, new_grad[:, i])

Also I have the wrapper function returning the sum of the FOMs. The optimization runs now, but the progress starts to look like this odd pattern of vertical stripes.

Screenshot 2023-05-18 110446

Could this be because related to issues with backpropagation? Thank you.

smartalecH commented 1 year ago

So I think you need to take a step back...

What do you plan on doing with two FOMs? Optimization algorithms require you specify a single objective function to optimize. If you have multiple FOMs that you care about (i.e. multi-objective optimization), there are infinitely many ways you can cast that as a single objective function.

IIRC, Martin chose to combine his FOMs using a linear scalarization (@mfschubert feel free to correct me). Typically, in our papers, we would use the chebyshev scalarization, since we were often doing minimax optimization (and we would recast the problem using an epigraph).

Now, the reason that this is important, is that if you plan to simply linearly recombine your FOMs into a single objective, you might as well do that from the very beginning, and avoid having to manually recombine the FOMs and gradients yourself. Meep's adjoint solver was designed to have this kind of flexibility from the beginning. Just define a single J that takes as arguments all the quantities you care about and combines your two FOMs in whatever way you like (so long as it remains differentiable).

BUT, if you plan to move your FOMs into constraint functions (e.g. using an epigraph, or because you have additional constraints) then you need independent adjoint solves like you're doing here (but you shouldn't recombine them...) At this point it's more of a "bookeeping" problem, ensuring all your gradients are in the right shape and that you are passing them correctly to nlopt etc. (we typically use MPI to compute multiple FOMs in parallel... but I don't recommend you start there).

For example, in our meep adjoint paper, we demonstrate broadband optimization of a polarization splitter. In this case, there are 6 FOMs (defined over 10 wavelength points) that we care about (so a total of 60). But we decided to group the TE FOMs together into one group (using nonlinear scalarization) and the TM FOMs together into another group. Since we care about the performance of both modes equally, and we want a flat response, we figure a minimax optimization over these (now 20, 10 aggregate TE and 10 aggregate TM) FOMs is a good choice. But all of those gradients are computed and passed to nlopt through constraint functions, not the actual objective function.

Arguably, you may get the same (or even better performance) if you just lump everything together into a single FOM! We actually tried that, and the minimax results were much much better. But this is the fun part of optimization -- there's never just one way to solve the problem! In fact, @stevengj continuously reminds us that how you set up the problem is often more important than what optimization algorithm you choose (within reason, of course).

NanoComp / meep

Optimization of multiple objectives #2520