Closed dngoldberg closed 5 months ago
reset_manager()
stop_manager(tlm=False)
configure_tml((m, zeta))
solver.forward(m)
tau = function_tml(solver.u, (m, zeta))
Then we compute $$-A.tau$$
Amat_obs_action(solver._cached.P_mat, R_vec, u, solver._cache....)
reset_manager() stop_manager(tlm=False) configure_tml((m, zeta)) solver.forward(m) tau = function_tml(solver.u, (m, zeta))
Then we compute −A.tau
Amat_obs_action(solver._cached.P_mat, R_vec, u, solver._cache....)
I cannot remember how the u_stderr and v_stderr are domain-decomposed.. i wonder if i can just use the variables solver.py or need to cache them?
@jrmaddison I have drafted new source to address this but I am so unsure about the types that Im not prepared to test yet, as errors could arise for so many reasons.
I have pushed a new runscript to my branch, which diverges from run_errorprop.py here where i read the cached variables from solver.py. I would be grateful if you could actually look at lines 162-180 to let me know what you think. I suspect that the direction of the TLM gradient must be a function, which is why I have defined P3. I also assume the return of function_tml
(tau) is a function in the same space as solver.u
.
Is function_tml
in the previous comment a typo?
Yes, function_tlm
to access a tangent-linear variable (https://tlm-adjoint.github.io/autoapi/tlm_adjoint/manager/index.html#tlm_adjoint.manager.function_tlm).
Tangent-linear variables are objects in the same space as the original forward, so a tangent-linear variable associated with a primal Function
is a primal Function
. This is because when we perturb $m \rightarrow m + \varepsilon \delta m$, with $\delta m$ the direction defining the tangent-linear and $\varepsilon$ some scalar, the linear change to a forward variable $u$ is $u \rightarrow u + \varepsilon \tau_u$ where $\tau_u$ is the tangent-linear variable. For the addition to make sense they must be in the same vector space.
Some comments on the draft:
Function
, not a Vector
. It's OK here to instantiate a new Function
and then set it via .vector()[:] =
. The only time this isn't OK is if more detailed tlm_adjoint flags (e.g. for caching) are set -- which they won't be unless you explicitly set them.I recommend factoring this into a separate function (i.e. callable) decorated with restore_manager
(https://tlm-adjoint.github.io/autoapi/tlm_adjoint/manager/index.html#tlm_adjoint.manager.restore_manager), along the lines of the following. This let's the forward mode calculation use it's own manager, and avoid interfering with any other AD calculations that might be going on (although there might still be side-effects via the ssa_solver
object).
@restore_manager
def compute_tau(forward, u, m, dm):
set_manager(EquationManager(cp_method="none", cp_parameters={}))
stop_manager()
start_manager(tlm=True)
configure_tlm((m, dm))
forward(m)
return function_tlm(u, (m, dm))
EDIT: Code fix
many thanks for looking james. ill try to create the new python function as suggested.
one thing -- i thought that, as coded, the direction is a (fenics) function:
P3 = Function(space)
P3.vector() = P2.vector() - P1.vector()
...
configure_tml((cntrl, P3))
tau = function_tml(solver.U, (cntrl, P3))
were you implying that this was done incorrectly? or that it is OK but not best practice?
Oops, yes it is.
Hi @jrmaddison @bearecinos i have run the new script with the ISMIPC experiment in serial mode. As of now i can only say that it does not throw runtime errors. James's suggestion to create a separate function for the TLM calculation turned out to be necessary (thanks James!) to avoid TLM runtime errors the second time the TLM is calculated.
I am not creating a PR now -- i will leave this now and work on visualising the outputs for ISMIPC.
James: am I correct that the utilities for Taylor testing will not work for this gradient (of VAF with respect to velocities) calculation, and a test would have to be created manually?
James: am I correct that the utilities for Taylor testing will not work for this gradient (of VAF with respect to velocities) calculation, and a test would have to be created manually?
Taylor verification would probably need to be hand coded, as the observations are not `annotatable' objects and the tlm_adjoint interface functions cannot interact with them.
hi @jrmaddison @bearecinos there are 2 functions nested within fenics_ice/model/vel_obs_from_data(): interpolate() and interp_weights(). I can use them to visualise the observation dependencies, but not if they are nested.. is there any good reason i should not bring them outside of class model? the alternative is caching vertices and weights returned by interp_weights() and duplicating interpolate().
Looks like model.vel_obs_from_data
should be factored out. It's both reading data and referencing the data on the model
. Those should be separate operations, in separate functions/methods.
ok i will try to do so and then ask you guys to take a look..?
hi @jrmaddison @bearecinos the code is now "working" in parallel, as in, it is not crashing. But the simple test i was using wont' work in parallel (for other reasons) so Im fumbling a bit blindly just now. I have one issue and one query:
ISSUE: the end result is a numpy array of the length of obs_local (the number of observations belonging to the proc). Im not sure if it NEEDS to be this way, but it is the way i wrote it. The issue is that I need to somehow to create a global array from the local arrays. Is there a correct way to do this other than just using mpi4py calls?
UPDATE: i have used an mpi reduce call to gather the local results. this is unreviewed and untested.
QUESTION: @jrmaddison this line from Amat_obs_action contains of matrix-vector multiplication of the local interpolation matrix P by the local vector of the DG(1) function. I assume this is OK, as (by construction) all columns of P corresponding to non-local observation points are zero?
A new source file is needed in order to calculate VAF sensitivity to velocities. It will closely model run_errorprop.py, and will likely be very similar up to this line where the equation
$$\sigma_T^2 = \bigg(\frac{\partial QT}{\partial\hat{m}}\bigg)^T H{approx}^{-1} \bigg(\frac{\partial Q_T}{\partial\hat{m}}\bigg)$$
is implemented. This final inner product needs to be replaced by a multiplication of the mixed Hessian to get the result:
$$-H{\hat{p}\hat{m}} H{approx}^{-1} \bigg(\frac{\partial Q_T}{\partial\hat{m}}\bigg)$$
where
$$H_{\hat{p}\hat{m}} = \frac{\partial^2 J}{\partial p_i \partial m_j}$$
where $\hat{m}$ are controls and $\hat{p}$ are velocity cloud values.
update 30 Nov to give origin of above equation
Starting with
$$\frac{\partial J}{\partial\hat{m}} = 0,$$
a total differential yields
$$ \frac{\partial^2 J}{\partial\hat{m}^2} \delta\hat{m} + \frac{\partial^2 J}{\partial\hat{m}\partial\hat{p}} \delta\hat{p} \equiv H{\hat{m}\hat{m}} \delta\hat{m} + H{\hat{m}\hat{p}} \delta\hat{p} = 0. $$
Solving for $\delta\hat{m}$ yields
$$ \delta\hat{m}=- H{\hat{m}\hat{m}}^{-1} H{\hat{m}\hat{p}}\delta\hat{p}, $$
which upon rearrangement gives $\frac{ \delta\hat{m}}{\delta\hat{p}}$. The expression above is then found by
$$ \frac{\partial Q}{\partial \hat{p}} = \frac{\partial Q}{\partial \hat{m}}\frac{\partial \hat{m}}{\partial \hat{p}}$$
as $Q$ has no direct dependence on $\hat{p}$.
end update 30 Nov
There are 2 key things to note: (1) $H_{\hat{p}\hat{m}}$ is not symmetric/self adjoint. So it is not clear how to construct the mixed Hessian operator, and (2) $\hat{p}$ lives in a bespoke function space which was created solely to manipulate point clouds.
This issue depends on annotation of observation points in
comp_J_inv
.UPDATE
the cost function formed in
comp_J_inv
is $J_u + J_v$ where$$Ju = \frac{1}{2}(\mathbf{P}\mathbf{D}u - u{obs})^T \mathbf{R} (\mathbf{P}\mathbf{D}u - u_{obs})$$
where $\mathbf{D}$ is a projection to a DG1 space, and similar for $J_v$. Expanding this out, the only relevant term is
$$-u^T\mathbf{D}^T\mathbf{P}^T\mathbf{R}u_{obs}$$.
$H_{\hat{p}\hat{m}}$ is thus
$$\frac{\partial}{\partial m}(\mathbf{A}u) = \mathbf{A}\frac{\partial u}{\partial m}$$
where $\mathbf{A} = -\mathbf{R}\mathbf{P}\mathbf{D}$.
update 29 November
The target derivative sought is therefore
$$\mathbf{R}\mathbf{P}\mathbf{D} \frac{\partial u}{\partial m} H_{approx}^{-1} \bigg(\frac{\partial Q_T}{\partial\hat{m}}\bigg)$$