Adjoint element API design

msandre commented 7 years ago

Following a suggestion from @RiccardoRossi I'm working to improve the API for adjoints in kratos. This topic is dedicated to the element interface. Elements conforming to this interface should be able to work with adjoint solvers and in combination with other elements in kratos.

The interface should work for existing elements (trivial implementation for, e.g., structural elements) as well as dedicated adjoint elements (sometimes useful for complex nonlinear element).
For steady problems, the element has to provide the gradient of the residual vector wrt variables vector and wrt nodal coordinates.
For unsteady problems, the element hast to provide the gradient of the residual vector at step n+step wrt kth variable derivatives vector. E.g., for the adjoint problem at nth time step using Newmark scheme we need the residual at steps n and n+1 derived wrt velocity and acceleration at time step n. We also need the gradient of the residual vector at time step n wrt nodal coordinates.
It should be possible to calculate adjoints for elements with time discretization at element level or scheme level.

In order to provide all the necessary information to the scheme, we are using the elem->Calculate(var, adjoint_matrix) and returning different matrices depending on the variable var.

For static problems I am proposing to use the variables:

ADJOINT_MATRIX for the system matrix of the adjoint problem
SHAPE_DERIVATIVE_MATRIX for the gradient wrt nodal coordinates

For the unsteady problem it is more complicated since we need to specify the step of the residual vector and the derivative of the variable (e.g., derivative 0 = disp, derivative 1 = vel, etc.). For this I propose:

ADJOINT_MATRIX_RESIDUAL_n_DERIVATIVE_k with n,k=0,1,2, etc.
SHAPE_DERIVATIVE_MATRIX_n

This is essentially the same as we are doing now for the fluid but more abstract.

jcotela commented 7 years ago

I think this is a very good idea but, if we are talking about designing an interface, why not add the functions to the element base class? @pooyan-dadvand, @RiccardoRossi do you think this is justified in this case?

I would go with something like

Element::AdjointMatrix(Matrix&, int Step = 0); Element::ShapeDerivativeMatrix(Matrix&, int Step = 0); Element::AdjointResidualDerivative(Matrix&, int Step = 0, int Order);

Then the same functions for Condition. We can have them return some kind of "Not Implemented" error message for the base classes.

I would not go with "Calculate", in any case. We are abusing it enough already, to do all kinds of stuff that is specific for a given problem. What you are proposing is standard enough that could be abstracted and I think deserves its own interface.

RiccardoRossi commented 7 years ago

I think the addition is justified, however i would think carefully to what would be the optimal in this context. What would be the ideal parameters to pass? how to change the objective funciton? telling this since "shape" is very specific of the shape optimization business, and i was wandering if it was possible to have it "generic" for other optimization apps.

pooyan-dadvand commented 7 years ago

+1 for @RiccardoRossi

msandre commented 7 years ago

Adding the functions would definitely be a better interface so let's focus on that. The objective function is outside of the element. I will try to open a separate issue on that after this one is settled.

Regarding the Element::ShapeDerivativeMatrix(), this is very specific and we could make it more generic. The question would be, what if we have an element that can support both shape and topology optimization? How would the generic function know which type of optimization is being performed and which is the correct contribution to calculate? Would it be better to pass some additional information in say the process info or define separate functions? @dbaumgaertner do you have any suggestions from your work on topology optimization? Would Element::CalculateSensitivityContributions(Matrix&, ProcessInfo&) be acceptable?

About the adjoint contributions, these are more complicated and I now think the order parameter is too confusing. In my opinion it would be better if we define analogous functions to the CalculateLeftHandSide, CalculateMassMatrix, CalculateDampingMatrix called either CalculateAdjointLeftHandSide(Matrix&, ProcessInfo&, int step=0), CalculateAdjointMassMatrix(Matrix&, ProcessInfo&, int step=0), CalculateAdjointDampingMatrix(Matrix&, ProcessInfo&, int step=0) or, more generally, CalculateAdjointFirstDerivativesContributions(Matrix&, ProcessInfo&, int step=0), CalculateAdjointSecondDerivativesContributions(Matrix&, ProcessInfo&, int step=0). The step parameter is only needed if we consider time discretizations such as newmark, bossak or generalized alpha. I attached an explanation of the adjoint contributions and purpose of these functions. adjoint_element_api.pdf

jcotela commented 7 years ago

I would like to ask for a small clarification... In your view, how is the adjoint problem solved? If it is a matter of building a regular FEM problem, but using the adjoint problem contributions, I would try to use the standard Element API. Say I want to do shape optimization for a fluid problem. I have the standard FEM fluid problem and the adjoint problem. I solve the fluid and now I want to solve the adjoint. If it is a matter of building a different FEM problem, why can't I defiene an AdjointElement (probalbly inheriting from the fluid element, so that I can access its internals) and use the regular AdjointElement::MassMatrix, AdjointElement::LocalSystem... and so on? I guess I'm missing something, but that is the simplest approach I can think of...

msandre commented 7 years ago

If we squeeze it enough we could make it work like that, but I'm not a big fan of that approach for the following reasons:

There is an added layer of inheritance with different properties of the element formulation split over the different layers but retrieved from the same function depending on which layer we are in.
The adjoints are a property of the element itself. Especially for problems like structural mechanics or heat transfer it is excessive to derive new elements for the adjoints. For the majority of cases, it is enough to use the same element for forward and adjoint problem and simply access the corresponding components through the supporting api.
The CalculateMassMatrix, CalculateDampingMatrix functions don't support historical step values so we would need to do it through the process info.
We would still need to add the CalculateSensitivityContributions to the api which would only make sense for the derived class.

Overall, I don't think it would be impossible to do it like this, but for the above reasons don't like the design.

jcotela commented 7 years ago

OK I understand better now. I think these are good reasons. Anyway, just to clarify, who manages all of this? Is there a strategy that calls these specialized optimization functions?

msandre commented 7 years ago

The scheme does the bulk of the work and knows how to assemble the adjoint based on time discretization. Objective functions are called by the scheme and are implemented in a separate class. We still have to define the interface for the objective function. They also work at the element level. The strategy is just the usual residual based linear strategy. Actually, everything above the scheme level is quite standard despite the fact that we go backwards in time.

jcotela commented 7 years ago

Final question (hopefully): what is the relationship between the objective function and the element in your design? Do we want the element know about the objective function (so we should pass it somehow)? Or do we want the objective function to know about the element?

I think any design we propose will eventualy involve some kind of objective function class...

msandre commented 7 years ago

Yes, the objective function works on the element almost like a scheme. It is neither defined in the element nor in the scheme because different optimization problems need different objective functions for the same element and time discretization (scheme). The idea is to define a base objective function class with a standard api that takes an element and adjoint system contributions and returns the local contribution to the objective function gradient. A similar interface is given for the sensitivity which is updated in the scheme's update function. This way, the element never knows about the objective function since it is unnecessary for every element to have a reference to this. The objective function knows about the element, but only at the time that the local element contributions are needed.

dbaumgaertner commented 7 years ago

Some comments from my side and from an optimizer perspective:

Let me first try to sort out the things a bit. Generally, we may categorize the sensitivities under discussion into two different types: the ones involving a response function J (objective or constraint), and the ones involving the governing equations R. So we have dJdx and dRdx. Furthermore, we may distinguish in both cases sensitivities w.r.t. the design variables x and sensitivities w.r.t. the state variables, say u: Consequently, we also have dJdu, dRdu. Finally, we also have the corresponding partial derivatives each: pJpx, pJpu, pRpx, pRpu.

Other than that, it is obviously important for the sensitivity analysis which type of design variables we have. That is, to have a generic approach in a sense of covering different types of optimization (shape, topology, thickness, …), one would need to introduce an abstract class for the type of design variable. So x describes not just the coordinates but we may assume there is also other optimization variables (x=thickness, x=density,…).

Having sorted out the members involved. A few comments on the implementation.

I would say that, whatever the response function should be, it, in any case, represents an abstract concept and hence should have a corresponding abstract base class from which all further response functions are derived. This is because, from an optimizer point of view, the only thing it really needs to do is:

Get/Evaluate response function value (say drag)
Get/Evaluate gradient dJdx (if one wants to use a deterministic approach).

So, the response function base class could enforce an implementation of these two functions. These two functions could then internally compute the necessary information and output the results in a standardized(!) format.

More precisely, the internal computations within each response function could be structured into four different functionalities following four basic steps:

Compute the response value by calling specialized functions (e.g. compute drag or sum over all elements to get total area / volume / mass)
Compute all partial response function gradients (pJpu, pJpx)
Provide the generic adjoint solver with pJpu and start it to compute lambda*pRpx (with lambda being the adjoint field)
Sum the parts from 2) and 3) to get dJdx = pJpx + lambda*pRpx, which is the gradient we are looking for. This gradient information then may be provided to the calling function in a standard format.

Note that in the four steps above, I assumed x is the coordinates like we are thinking of it know. But having an abstract design variable class, we could simply use it and multiply dJdx with some contribution that is special for the type of design variable (like contributions from the Vertex Morphing Method, a CAD parametric or others). By that, we would have the most generic optimization/sensitivity capabilities possible.

Also, just to have the things complete here, we must not forget that an optimization may involve a mesh-motion, meaning we need to also take into account sensitivities from the mesh motion solver. But much like for the contribution of the design variables, this may be added rather easily.

Having said this. I would conclude for now:

The calculation of the gradients related to the governing equations should go down to the elemental level. Since these are generic and may be used for any type of optimization and once in a while also for improving solution algorithms etc.. So here, I would really try to get something like e.g. Element::CalculateJacobian() for pRpu, Element::CalculateGeometricSensitivity for pRpX.

The response function should call the necessary adjoint solver, which in turn provides the required information by running the implemented scheme or strategy. Which adjoint solver is used, is then decided by the response function (if at all, because for structures occasionally, there is no need for an extra linear adjoint solve like e.g. in the so far implemented mass & strain energy optimization). The adjoint solver itself, of course, may require additional generic elemental functions, which for sure @msandre knows much better than I do.

Assuming we have a response function object and it can return dJdx, the optimizers in Kratos (topology or shape) may be readily used for an optimization. But once we get there, we could also think of merging these two optimizers.

Last, not least, I also very much like the Idea of making this a generic process. The reason is simple. Once the response function concept is generalized, one could add new response functions with only little effort. Moreover, one could use new elements "automatically” for an optimization given they also implement the mentioned elemental gradients.

jcotela commented 7 years ago

OK, so, trying to understand the requirements (and trying to follow the notation of @dbaumgaertner) :

The only thing the element knows how to do right now is give pRpU, all other derivatives involved need to be added somehow.
The calculation of (derivatives of) the response function J can be separated from the rest, but I guess it needs to know about the element.

Now, to compute derivatives of R with respect to X, the element needs to know what we are optimizing for. How easy/difficult is it to compute pRpX in a generic way? What I mean with this is, @dbaumgaertner is refering to Element::CalculateGeometricSensitivity for pRpX. Is this generic enough? Will we need different functions for different design variables?

armingeiser commented 7 years ago

Hello all,

pRpX is totally dependent on the design variable X, so either we have to implement a specific function (e.g. Element::CalculateGeometricSensitivity for X being coordinates) for each kind of design variable (shape, thickness...) or pass some kind of flag to this function (e.g. Element::CalculateSensitivity(GEOMETRY)).

For structural problems we are in the lucky position that we can calculate dRdX with a local finite difference at element level, giving good results. This so called semi-analytic approach has the advantage that we don`t need to develop and implement the Element::CalculateGeometricSensitivity for each combination of element and design variable.

I don`t know exactly what's the case for fluid problems, but i assume this is not possible there?

RiccardoRossi commented 7 years ago

Hi Armin,

this is exactly the point i wanted to address in the discussion:

is there any way we can express through the function call what king of dRdx we want? i think that exactly for this it would be inappropriate to specify it in the name, since any derivative would require a specific function

cheers Riccardo

dbaumgaertner commented 7 years ago

You guys are right. I think I missed this part since I was too much focusing on shape optimization here. So dRdx is generally depending on the choice of the design variable. That also means that the elemental functions to compute this need to do know the design variable type. But we may again distinguish here two different situations:

1) We are given a shape optimization problem. In these cases, dRdx is always referring to a derivation w.r.t. the coordinates. If x is any other shape parameter set "v" (say some CAD parameters) it holds according to the chain rule: dRdv = dRdx*dxdv. So in this case, the response function class would then call the elements to compute dRdx = Element::ComputeGeometricSensitivit(), subsequently call the design variable class to compute dxdv = DesignVariable::ComputeParameterSensitivity() and finally just multiply both.

2) For all other optimization cases, where we need a special computation of dRdx, like e.g. a derivation of the shell equations w.r.t. the shell thickness in a thickness optimization, an extra function would need to be added to the element. Something like Element::ComputeThicknessSensitivity(). For the element to know that this sensitivity is required, one could set a Flag (THICKNESS_SENSITIVITY_REQUIRED or something) in the moment when we define our design variables. The same flag could also be read in the response function class itself to check whether we need to add additional terms (like the dxdv from above, or others from the mesh motion). Indeed, for every type of optimization, one would then need to implement explicitly the different sensitivities. But I think this makes sense. Because not all elements are usable for all optimization types obviously (there is e.g. no thickness in the case of solids).

One final comment to this flag. Since we are in the early stage, we should try covering some important practical situations that can happen. One situation would be, that design variables are mixed (thickness+ density for example). This would imply, that the flag above is an "elemental flag" which for every element decides whether a specific sensitivity is needed. I am thinking of something very practical. E.g. one decides to incorporate the thickness of the shell elements 253 to 300 into the set of design variables. In doing so, internally the flag "THICKNESS_SENSITIVTY_REQUIRED" is set for all these elements. If one then decides to additionally include the density of the elements 280 to 350 to the set of design variables, another flag could be set accordingly. And so on and so forth.

I don't think this is already the complete picture. And for sure, I am missing here something. Also, I guess, structuring the optimization towards something more general requires a physical meeting at some point. But we are not that far off and I think it is worth settle this once in the beginning, to be flexible in the future.

Cheers

msandre commented 7 years ago

Ok, I think we are all on the same track as far as using a generic function for pRpX. The question is how to define it? I'm in favor of using kratos variables since we anyway have these defined for current design variables, e.g., Variable<Kratos::array_1d<double, 3> > SHAPE_SENSITIVITY is used to store the sensitivities on the nodes. If the design variables are homogeneous then we could simply use

CalculateAdjointSensitivityContributions(Matrix&, const Variable<Kratos::array_1d<double, 3> >&, ProcessInfo&)

CalculateAdjointSensitivityContributions(Matrix&, const Variable&, ProcessInfo&)

In the mixed example above, I would make two calls:

CalculateAdjointSensitivityContributions(pRpX, THICKNESS, ProcessInfo) CalculateAdjointSensitivityContributions(pRpX, DENSITY, ProcessInfo)

and combine them outside of the element. This way the interface to the element is kept simple and transparent. I would try to keep a constant block structure of design variables over all elements and just zero out or ignore the sensitivities that you don't want to include in the optimization since their computation is quite cheap and this avoids a lot of branching and complexity inside the element.

msandre commented 7 years ago

sorry, the second function prototype above should be a double variable. I'm still having issues with markdown.

jcotela commented 7 years ago

In terms of design, I see we have three different concepts we need to represent somehow: the Response function (objective or constraint), the Element (FEM formulation, residual) and the Design variable. Ideally, I would like to represent each of these with a class. However, I also get the impression that each of these three things is tightly coupled to the other two: for example, you need to use know both the internal details of the element and of the design variable to calculate the adjoint sensitivities. Is this the case? I think that, given our constraints, defining functions to the element as you propose may be the best option. What I don't really like is this idea of passing the design variable as a flag, because I feel it will end in huge element classes with lots of branching, which is a pain to maintain and update. I would prefer to use derivation here (base elment deriving into shape optimization element, thickness optimization element and so on). Another alternative, if we can somehow identify common needs in the different types of optimization, would be to encapsulate the "shape optimization details" and "thickness optimization details" and so on into separate auxiliary classes and passing those as members (or template parameters) to a common optimization element. Do you think something like this could work?

msandre commented 7 years ago

We only need to know the design variable in the last step (after we solve the adjoint problem) to get the sensitivity. So the sensitivity is directly connected to the design variable. The design variable itself doesn't have internals that I can think of.

I think the amount of design variables for a particular element will be small. For example, a shell element might have shape and thickness design parameters. A solid element might have shape and density, a fluid element might have shape and porosity or something like this. I think the common case will be shape (nodal coordinates) regardless of element type. I'm not sure if a full axiliary class is the way to go. I'll try to think about this.

msandre commented 7 years ago

What if we had the standard case of nodal sensitivities and derive specialized elements for less standard cases?

jcotela commented 7 years ago

Just to clarify, I'm not opposed to the flag-based design, I'm just trying to find alternatives because I think it will be harder to maintain. Something I'm not really sure about is how hard it is to adapt a given problem to a new type of design variable. Say we have a set of governing equations R and two (sets of) design variables, X and Y. If we need to obtain pRpX and pRpy, are these two completely different operators, or is there something of a common ground pRpX = pRpZ * pZpX and pRpY = pRpZ * pZpy, where Z are some element internals (shape functions? solution u?).

jcotela commented 7 years ago

By the way, when I was speaking of "design variable internals" I was thinking in terms of these pZpX.

msandre commented 7 years ago

Nodal sensitivities can be used at daniel pointed out derive other sensitivities based on CAD parameters or other. But, we can't do this pRpX = pRpZ * pZpY for cases like shell thickness because the shell thickness is not related to the nodal coordinates. Such cases are specific to a specific element model since the thickness parameter only makes sense for a shell.

RiccardoRossi commented 7 years ago

Hi,

  all of these discussion are very relevant. I would suggest however to

propose here some prototypes of the API "in code form", since the discussion would be more concrete once we see the form the functions would take. Only my idea though... so if you prefer to go on "abstrat" as you wish

cheers Riccardo

http://www.cimne.com/

jcotela commented 7 years ago

Well, I think we are slowly converging towards an API. However, I'm very grateful that they still have the patience to answer my "abstract" questions to clarify which is the problem that the API has to solve.

msandre commented 7 years ago

The discussion and questions have been very helpful so far so thanks for everyone's comments. I think we have covered most of the aspects of the element api. The aim is not to converge 100% here, but to get close to the optimum and then refine things when we start looking at the code. I find it easier to keep the discussion organized here and thus hopefully converge faster by focusing on one issue at a time in the initial phase. I would still like to discuss the geometry interface for obtaining pRpX and the response function. Then I would suggest we start a branch and discuss the code there. In the end, these threads should be organized in the project page so we can refer back to them in the future.

RiccardoRossi commented 7 years ago

@jcotela, i guess that my post sounded rude. That was not my intention, sorry about that.

I think however we do have an issue with how the actual function should look like. For example, the derivative should always be a matrix? of what size?

i can foresee at least 3 cases (but there are probably more)

"x" is a NODAL scalar, like "PRESSURE","DENSITY"
"x" is a single value PER ELEMENT, say "YOUNG MODULUS"
"x" is a "vector value" per every node. Say "DISPLACEMENT".

MAYBE one option to solve this would be to provide a list of dofs as input for the nodal case. Not sure if it is a good idea...just exposing my first thoughts here.

how would the output variable be defined? for example for case 2, i guess it would be a rectangular matrix, is that so?

what if what we are optimizing is with respect to a direction? (imagine for example we want to decide how to orient the strongest axis of a beam)

my fear, is that once we take this into account, the number of functions needed in the interface grows very fast. This is the reason for which i was asking to detail the input/output API for the calls you are proposing

jcotela commented 7 years ago

@RiccardoRossi no offense taken. You are right we need to address this issue of the output sizes and organization, but I wanted to clarify some things before.

In any case, the impression I get so far is that the design is converging towards element contributions. This suggests that a possibility is to define the equivalents of the EquationIdVectors and GetDofs that would help manage the organization of the matrices of the adjoint problem. So, a first proposed API, from the suggestions collected so far, would be

CalculateAdjointSensitivityContributions(Matrix&, const Variable<Kratos::array_1d<double, 3> >&, ProcessInfo&)

Where Matrix is the local contribution to the adjoint problem and the Variable tells us what we are optimizing for. Alternatively, we could define some kind of flag in place of the Variable (this would avoid having to define overrides for different variable types, which I don't think make much sense in this case.

I propose this could have some associate functions

GetAdjointSensitivityEquationIdVector(EquationIdVector&, const Variable<Kratos::array_1d<double, 3> >&, ProcessInfo&)
GetAdjointSensitivityDOFs(EquationIdVector&, const Variable<Kratos::array_1d<double, 3> >&, ProcessInfo&)

That would help us know what we are optimizing for.

A separate question the main ouput would be for pRpX. Do we need to be able to access that from the API? (I guess yes?) If so, I assume this will be some kind of matrix. How do we organize it? (one row/column per variable?)

RiccardoRossi commented 7 years ago

hi @jcotela, my 2 cents:

1 - you will need to consider different type of variable (double and array_1d) 2 - if the variable you derive respect is "nodal" then i think it would make sense to return the doflist in the same call. EquationId is not really needed since you can always recover it from the doflist. 3 - An important point is however that the output matrix may well be rectangular, how do we identify which dofs correspond to rows and which to columns?

a different problem is the case in which the value we derive for is not nodal. Think of an elemental properties, such as the orientation of a beam, or the young modulus. What would we do in this case? no dofs are available since there is no node to associate them to...

msandre commented 7 years ago

I'm not understanding the reason for having dofs and equation ids here. Whatever X is, assume our function CalculateAdjointSensitivityContributions() gives the elemental pRpX. Now the only thing we need to do is get the adjoint solution vector for this element and multiply it with the transpose of pRpX and store the result locally (on the nodes or element). It's true that for the adjoint problem we need to assemble and have adjoint dofs. But the dependence on design variables doesn't appear until the Update() in the scheme (the simplest example is in adjoint_steady_scheme.h).

msandre commented 7 years ago

I just checked and it's currently in FinalizeSolutionStep(), not Update().

RiccardoRossi commented 7 years ago

Hi Mike,

then most probably i am not understanding the discussion.

if R is the residual, its ordering is prescribed by the GetDofList of the element. however who gives information about the ordering of the X variables? for example it could be node1->DISPL_X, node2 DISPL_X, node_1 DISPL_Y, node_2, DISPL_Y or node1->DISPL_X, node1 DISPL_Y, node_2 DISPL_X, node_2, DISPL_Y

if you eventually want to assemble such derivative you'll have to know this order.

Am i getting it wrong?

msandre commented 7 years ago

Ok, I understand what you are saying. The reason why I didn't see this as necessary is because I never consider assembling this into a global vector.

I assume the sensitivity can be expressed as a double or array_1d on the node or the element and identified with SOME_VARIABLE. Now if it is in the element then the function returns pRpX with size of X either 1 for double or the dimension of array_1d. If is on the node then X is nnode x 1 for double or nnode x dim of array_1d.

msandre commented 7 years ago

It isn't a problem to compute more sensitivites after solving the adjoint. If we want sensitivities for SOME_VARIABLE1, SOME_VARIABLE2, etc... we just loop over all variables, calculate the sensitivity, and store it on node/element for that variable. In then end this step is cheap because we don't need to solve anything. This might be more information than the optimizer needs. In this way, there might be some sort of list of design variables, but I would keep this outside of the element.

RiccardoRossi commented 7 years ago

Hi Mike, this means that an element writes to its nodes? that would be an issue for openmp parallelism...

msandre commented 7 years ago

Yes, currently, it is done in the scheme but same issue occurs for openmp so we need a lock. This is a minor issue in my opinion since other parts of the solution have higher algorithmic complexity.

RiccardoRossi commented 7 years ago

What if we give back a matrix "A" and we do that assembly outside of hte element? i am telling since the element should not programmaticaly write onto its variables.

if one wants, he could then easily assemble matrix A to the nodes in the scheme (so that the locks are done at the scheme level instead of within the element itself)

A for example could be such that

A(i,j) is pRpX of node i component j (of the variable of interest)

msandre commented 7 years ago

Yes, this is what we do currently and would keep it this way. We only need to assemble a vector to the nodes since we multiply pRpX with the adjoint directly after getting it from the element.

RiccardoRossi commented 7 years ago

mmm, i think i expressed myself badly, i mean something like the following pseudo code

Matrix A
for(auto it : model_part.Elements() )
{
    it->CalculateAdjoint...(A, ....)

    HERE assemble A to the nodes.
}

the advantage would be that 1 - the element does not have a writing access to its nodes 2 - you can combine multiple CalculateAdjoint functions in the scheme prior to assembling them

msandre commented 7 years ago

Yes, I think I understand this. So expanding on your code

Matrix A
Vector adjoint, tmp
vector<Variable < double > * > design_variables;
...
for (auto it : model_part.Elements())
{
    // get the adjoint vector from the element (e.g., it->GetAdjointValuesVector(adjoint))
    ...
    for (auto p_current_variable : design_variables)
    {
        it->CalculateAdjointSensitivityContributions(A, *p_current_variable, pinfo);
        tmp = prod(A, adjoint);
        for (unsigned int i=0; i < it->GetGeometry().size(); ++i)
            it->GetGeometry()[i].FastGetSolutionStepValue(*p_current_variable) += tmp[i];
    }
}

RiccardoRossi commented 7 years ago

Hi @msandre,

what you write makes sense, however there will be "painful" details about that. For examples we will often have to deal with a mix of Variable and Variable<array_1d<double,3> >. Maybe @pooyan-dadvand can suggest smthg about that? (the concrete question is how to pass a list of variables of different type)

on a slightly different issue, i am not clear once again on the sizes of the vectors here.

let's consider for example that we have a tet element with 3 velocities and 1 pressure per node. as i understand the "adjoint" vector will have here 16 variables.

this means that the matrix A would always need to have a 16*16 size?

can i ask you to make a very concrete example, for example in a steady state fluid case, where our design variables are the position of the nodes on the surface? (so a shape optimization problem)? Pick a different one if you like, i was proposing this example since we have 3 "x", the components of the position and 4 variables in the residual vx vy vz p

msandre commented 7 years ago

Hey Riccardo,

Sorry for the slow response. There were some urgent issues this week. The size of the adjoint vector is as you say. The adjoints are one to one with the primals. So for each variable in the regular problem there will be exactly one adjoint variable. Here the matrix A is the transpose of pRpX, which I will call pRpXt, and is not always 16x16. Take your example of steady or unsteady fluid case with design variables the position of the nodes on the surface. For a tet element there will be 16 adjoint variables and 12 design variables (3 coordinates per node) so the matrix is 12x16.

Let's assume that the CalculateAdjointSensitivityContributions() function only can handle one variable per call so that we don't need some special dof vector to decode the entries of pRpX. Then there is the possibility to overload it for the two variable types (Variable and Variable<array_1d>).

Matrix pRpXt;
Vector adjoint, scalar_sensitivity, vector_sensitivity;
vector<Variable < double > * > scalar_design_variables;
vector<Variable < array_1d<double,3> > * > vector_design_variables;
...
for (auto it : model_part.Elements())
{
    // get the adjoint vector from the element (e.g., it->GetAdjointValuesVector(adjoint))
    ...
    for (auto p_current_variable : scalar_design_variables)
    {
        it->CalculateAdjointSensitivityContributions(pRpXt, *p_current_variable, pinfo);
        scalar_sensitivity = prod(A, adjoint);
        for (unsigned int i=0; i < it->GetGeometry().size(); ++i)
            it->GetGeometry()[i].FastGetSolutionStepValue(*p_current_variable) += scalar_sensitivity[i];
    }

    for (auto p_current_variable : vector_design_variables)
    {
        it->CalculateAdjointSensitivityContributions(pRpXt, *p_current_variable, pinfo);
        vector_sensitivity = prod(pRpXt, adjoint);
        unsigned k = 0;
        for (unsigned int i=0; i < it->GetGeometry().size(); ++i)
        {
            array_1d<double>& var = it->GetGeometry()[i].FastGetSolutionStepValue(*p_current_variable);
            for (unsigned d=0; d<3; ++d)
                var[d] += vector_sensitivity[k++];
        }
    }
}

I know there are other ways to do this, but this is the way I think would be most clear to kratos developers since we use standard types and the API is similar to the existing element API. It also gives full flexibility to the type of design variables one could define and the order in which they might be defined.

msandre commented 7 years ago

I started a wiki page based on the feedback with the current API design for adjoint elements and response functions at

https://github.com/KratosMultiphysics/Kratos/wiki/Adjoint-API

@jcotela's original suggestion to use the existing element API is used as much as possible so that we don't abuse the Calculate() functions too much, we don't need to add a large number of functions to the element base class and we don't need to modify the existing element implementations.

Another concern was the need to implement a new adjoint element for each structural element. To avoid this, one can create a single templated adjoint element (the structural element is the template parameter) that forwards the mass, damping and stiffness matrices and use this to generate all the adjoint elements. The finite differencing can also be added in this element for sensitivities.

Unfortunately, there are no standard names for the matrices and vectors involved in the adjoint problem and sensitivity calculation. My logic for naming is based on the fact that each adjoint equation (there can be up to 3 in general) is based on taking the partial derivative of the Lagrangian wrt either solution variables, their first derivatives or their second derivatives at the current time step. The adjoint element provides the left hand side matrix contributions of the adjoint problem while the response function adds the non-homogeneous part to the right hand sides. Based on this, I choose the element functions CalculateLeftHandSide, CalculateFirstDerivativesLHS, and CalculateSecondDerivativesLHS and named the functions in the response function class CalculateGradient, CalculateFirstDerivativesGradient and CalculateSecondDerivativesGradient.

There are two additional element functions that are needed in the base class (the CalculateSensitivityMatrix functions). The response function stores the design variables and is used to compute the sensitivities. It calls the element functions CalculateSensitivityMatrix and updates the sensitivities on the nodes or gauss points.

pooyan-dadvand commented 7 years ago

Seems reasonable at first glance! Good job! 👍

msandre commented 7 years ago

I'm closing this issue now that the changes to the adjoint based on the above discussion are integrated into master and described in the wiki. Future changes will be discussed in separate issues.

KratosMultiphysics / Kratos

Adjoint element API design #377