fancompute / workshop-invdesign

📐 Workshop material for optical inverse design and automatic differentiation
MIT License
95 stars 35 forks source link

Is auto-differential the same as adjoint method? #3

Closed maweibest closed 4 years ago

maweibest commented 4 years ago

Very good tutorial. It's amazing that the inversely designed structure with such non-intuitive structures can realized these optical functionalities. In your implementation, it seems that you simply use auto-differential to track all the operations dealing wtih the design parameters (epsilon distribution), including the FDFD simulation, and backpropagate to find the derivatives. How is this connected to adjoint method? In the slides, you mentioned adjoint equation is setup similar to the EM simulation but with different source. Where is the adjoint equation in your implementation?

twhughes commented 4 years ago

Thanks for the question! The adjoint method is equivalent to 'reverse-mode' automatic differentiation, which is the default gradient calculation in ceviche. Some of our earlier packages, such as angler define the adjoint explicitly, but this is quite tedious, especially for complex problems. In the automatic differentiation approach, we hard code the adjoint for each of the elementary operations and then at run time, the program constructs one large adjoint problem behind the scenes based on how we've called each of these operations in our code.

To give a more specific example, when we import the numpy package, which provides most of the elementary operations, we actually import a numpy wrapper from the autograd package. In autograd, the adjoint equations for most numpy operations are pre-defined in the source code. In ceviche, we've also added the adjoint equations dealing with solving sparse linear systems (solve Ax=b for x) that are needed for the frequency domain solvers. These are the ones most commonly seen in EM simulation papers talking about adjoint method. When a user wants to construct the gradient of an objective function that uses an electromagnetic field solution, the code records each operation that goes into this calculation and the adjoint equation for each is looked up and used to construct a large 'computational graph', which essentially specifies the large 'adjoint equation' for this objective function. When we evaluate the gradient at a specific set of parameters, we plug the values for the forward pass into this adjoint equation that has been automatically reconstructed, which gives us our actual gradient.

While the automatic differentiation approach makes things much easier and simpler from a programming standpoint, the physical interpretation of the adjoint problem (for example, the new source) is still there, it's just buried deeper in the code. As an example, here is where we define the adjoint equation for solving x in Ax=b. If we were differentiating a function f(x) then v in this line would be partial f / partial x. You can see that, just like in the typical adjoint case from electromagnetics, the adjoint problem involves solving the same system (now transposed) where the right hand side is replaced by - partial f / partial x = -v. So the physics of the adjoint source is still valid, but its just abstracted into these low level adjoint 'primitives'.

For more detail, I'd recommend a recent paper from our group that discusses the connection between automatic differentiation and adjoint method more deeply, and in the context of photonic crystal design. The paper is here.

Hopefully that gives some more context on the connection between adjoint and automatic differentiation. TLDR: they are the same thing, but with automatic differentiation, we just specific the adjoint for each individual operation and the program figures out how to combine these together to construct one big adjoint equation for the whole problem without needing a human to derive all of that by hand.