new run_* source file needed for VAF sensitivity to velocities

dngoldberg commented 1 year ago

A new source file is needed in order to calculate VAF sensitivity to velocities. It will closely model run_errorprop.py, and will likely be very similar up to this line where the equation

$$\sigma_T^2 = \bigg(\frac{\partial QT}{\partial\hat{m}}\bigg)^T H{approx}^{-1} \bigg(\frac{\partial Q_T}{\partial\hat{m}}\bigg)$$

is implemented. This final inner product needs to be replaced by a multiplication of the mixed Hessian to get the result:

$$-H{\hat{p}\hat{m}} H{approx}^{-1} \bigg(\frac{\partial Q_T}{\partial\hat{m}}\bigg)$$

where

$$H_{\hat{p}\hat{m}} = \frac{\partial^2 J}{\partial p_i \partial m_j}$$

where $\hat{m}$ are controls and $\hat{p}$ are velocity cloud values.

update 30 Nov to give origin of above equation

Starting with

$$\frac{\partial J}{\partial\hat{m}} = 0,$$

a total differential yields

$$ \frac{\partial^2 J}{\partial\hat{m}^2} \delta\hat{m} + \frac{\partial^2 J}{\partial\hat{m}\partial\hat{p}} \delta\hat{p} \equiv H{\hat{m}\hat{m}} \delta\hat{m} + H{\hat{m}\hat{p}} \delta\hat{p} = 0. $$

Solving for $\delta\hat{m}$ yields

$$ \delta\hat{m}=- H{\hat{m}\hat{m}}^{-1} H{\hat{m}\hat{p}}\delta\hat{p}, $$

which upon rearrangement gives $\frac{ \delta\hat{m}}{\delta\hat{p}}$. The expression above is then found by

$$ \frac{\partial Q}{\partial \hat{p}} = \frac{\partial Q}{\partial \hat{m}}\frac{\partial \hat{m}}{\partial \hat{p}}$$

as $Q$ has no direct dependence on $\hat{p}$.

end update 30 Nov

There are 2 key things to note: (1) $H_{\hat{p}\hat{m}}$ is not symmetric/self adjoint. So it is not clear how to construct the mixed Hessian operator, and (2) $\hat{p}$ lives in a bespoke function space which was created solely to manipulate point clouds.

This issue depends on annotation of observation points in comp_J_inv.

UPDATE

the cost function formed in comp_J_inv is $J_u + J_v$ where

$$Ju = \frac{1}{2}(\mathbf{P}\mathbf{D}u - u{obs})^T \mathbf{R} (\mathbf{P}\mathbf{D}u - u_{obs})$$

where $\mathbf{D}$ is a projection to a DG1 space, and similar for $J_v$. Expanding this out, the only relevant term is

$$-u^T\mathbf{D}^T\mathbf{P}^T\mathbf{R}u_{obs}$$.

$H_{\hat{p}\hat{m}}$ is thus

$$\frac{\partial}{\partial m}(\mathbf{A}u) = \mathbf{A}\frac{\partial u}{\partial m}$$

where $\mathbf{A} = -\mathbf{R}\mathbf{P}\mathbf{D}$.

update 29 November

The target derivative sought is therefore

$$\mathbf{R}\mathbf{P}\mathbf{D} \frac{\partial u}{\partial m} H_{approx}^{-1} \bigg(\frac{\partial Q_T}{\partial\hat{m}}\bigg)$$

bearecinos commented 1 year ago

reset_manager()
stop_manager(tlm=False)
configure_tml((m, zeta))
solver.forward(m)
tau = function_tml(solver.u, (m, zeta))

Then we compute $$-A.tau$$

Amat_obs_action(solver._cached.P_mat, R_vec, u, solver._cache....)

dngoldberg commented 1 year ago

reset_manager()
stop_manager(tlm=False)
configure_tml((m, zeta))
solver.forward(m)
tau = function_tml(solver.u, (m, zeta))

Then we compute −A.tau

Amat_obs_action(solver._cached.P_mat, R_vec, u, solver._cache....)

I cannot remember how the u_stderr and v_stderr are domain-decomposed.. i wonder if i can just use the variables solver.py or need to cache them?

dngoldberg commented 1 year ago

@jrmaddison I have drafted new source to address this but I am so unsure about the types that Im not prepared to test yet, as errors could arise for so many reasons.

I have pushed a new runscript to my branch, which diverges from run_errorprop.py here where i read the cached variables from solver.py. I would be grateful if you could actually look at lines 162-180 to let me know what you think. I suspect that the direction of the TLM gradient must be a function, which is why I have defined P3. I also assume the return of function_tml (tau) is a function in the same space as solver.u.

Is function_tml in the previous comment a typo?

jrmaddison commented 1 year ago

Yes, function_tlm to access a tangent-linear variable (https://tlm-adjoint.github.io/autoapi/tlm_adjoint/manager/index.html#tlm_adjoint.manager.function_tlm).

Tangent-linear variables are objects in the same space as the original forward, so a tangent-linear variable associated with a primal Function is a primal Function. This is because when we perturb $m \rightarrow m + \varepsilon \delta m$, with $\delta m$ the direction defining the tangent-linear and $\varepsilon$ some scalar, the linear change to a forward variable $u$ is $u \rightarrow u + \varepsilon \tau_u$ where $\tau_u$ is the tangent-linear variable. For the addition to make sense they must be in the same vector space.

Some comments on the draft:

The direction must be a Function, not a Vector. It's OK here to instantiate a new Function and then set it via .vector()[:] =. The only time this isn't OK is if more detailed tlm_adjoint flags (e.g. for caching) are set -- which they won't be unless you explicitly set them.
I recommend factoring this into a separate function (i.e. callable) decorated with restore_manager (https://tlm-adjoint.github.io/autoapi/tlm_adjoint/manager/index.html#tlm_adjoint.manager.restore_manager), along the lines of the following. This let's the forward mode calculation use it's own manager, and avoid interfering with any other AD calculations that might be going on (although there might still be side-effects via the ssa_solver object).
```
@restore_manager
def compute_tau(forward, u, m, dm):
set_manager(EquationManager(cp_method="none", cp_parameters={}))
stop_manager()

start_manager(tlm=True)
configure_tlm((m, dm))
forward(m)
return function_tlm(u, (m, dm))
```

EDIT: Code fix

dngoldberg commented 1 year ago

many thanks for looking james. ill try to create the new python function as suggested.

one thing -- i thought that, as coded, the direction is a (fenics) function:

P3 = Function(space)
P3.vector() = P2.vector() - P1.vector()
...
configure_tml((cntrl, P3))
tau = function_tml(solver.U, (cntrl, P3))

were you implying that this was done incorrectly? or that it is OK but not best practice?

jrmaddison commented 1 year ago

Oops, yes it is.

dngoldberg commented 1 year ago

Hi @jrmaddison @bearecinos i have run the new script with the ISMIPC experiment in serial mode. As of now i can only say that it does not throw runtime errors. James's suggestion to create a separate function for the TLM calculation turned out to be necessary (thanks James!) to avoid TLM runtime errors the second time the TLM is calculated.

I am not creating a PR now -- i will leave this now and work on visualising the outputs for ISMIPC.

James: am I correct that the utilities for Taylor testing will not work for this gradient (of VAF with respect to velocities) calculation, and a test would have to be created manually?

jrmaddison commented 1 year ago

James: am I correct that the utilities for Taylor testing will not work for this gradient (of VAF with respect to velocities) calculation, and a test would have to be created manually?

Taylor verification would probably need to be hand coded, as the observations are not `annotatable' objects and the tlm_adjoint interface functions cannot interact with them.

dngoldberg commented 1 year ago

hi @jrmaddison @bearecinos there are 2 functions nested within fenics_ice/model/vel_obs_from_data(): interpolate() and interp_weights(). I can use them to visualise the observation dependencies, but not if they are nested.. is there any good reason i should not bring them outside of class model? the alternative is caching vertices and weights returned by interp_weights() and duplicating interpolate().

jrmaddison commented 1 year ago

Looks like model.vel_obs_from_data should be factored out. It's both reading data and referencing the data on the model. Those should be separate operations, in separate functions/methods.

dngoldberg commented 1 year ago

ok i will try to do so and then ask you guys to take a look..?

dngoldberg commented 1 year ago

hi @jrmaddison @bearecinos the code is now "working" in parallel, as in, it is not crashing. But the simple test i was using wont' work in parallel (for other reasons) so Im fumbling a bit blindly just now. I have one issue and one query:

ISSUE: the end result is a numpy array of the length of obs_local (the number of observations belonging to the proc). Im not sure if it NEEDS to be this way, but it is the way i wrote it. The issue is that I need to somehow to create a global array from the local arrays. Is there a correct way to do this other than just using mpi4py calls?

UPDATE: i have used an mpi reduce call to gather the local results. this is unreviewed and untested.

QUESTION: @jrmaddison this line from Amat_obs_action contains of matrix-vector multiplication of the local interpolation matrix P by the local vector of the DG(1) function. I assume this is OK, as (by construction) all columns of P corresponding to non-local observation points are zero?

EdiGlacUQ / fenics_ice

new run_* source file needed for VAF sensitivity to velocities #120