Open bdestombe opened 4 years ago
The issue of sequential calibration underlies the design decision in pyfocs: we chug through the data sequentially in order to keep the computational overhead low for the calibration. Something to consider is that at a certain point dts data tends to become so large that the computational overhead is unreasonably expensive even within a framework like dask.
Hi, yes you are right and I agree with you, but I think there are some use cases that would still benefit from it.
Currently, almost every part of the DTS calibration routine is/can be run by Dask, meaning that you could do most of the calibration on personal computers with limited memory. Except for the (sparse) least-squares routines. They require to have X
(coefficient matrix), y
(observations), and w
(weights) completely loaded into memory, to estimate p_sol
the unknown parameters and their covariance p_cov
. But our matrices are very tall, we have many more observations than unknown parameters.
Thus what we can do, we can take a first selection of observations: y_1
, w_1
, and X_1
. And we obtain a first estimate p_sol_1
and p_cov_1
. For our next calibration we use p_sol_1
and p_cov_1
as prior together with the second selection of observations y_2
, w_2
, and X_2
, to obtain p_sol_2
and p_cov_2
. We can keep chaining them together until we have used all observations.
The final estimates at the end of the chain: p_sol_final
and p_cov_final
, are the same as when all observations were used at once (p_sol
and p_cov
).
I see several applications that would benefit from this. First of all, it would reduce the maximum memory needed. Second, you can make a surrogate model, with a limited number of observations to obtain p_sol_1
and p_cov_1
, use that to get a first estimate of the temperature. Do your georeferencing or prepare other workflows, after which the calibration can be continued without processing y_1
, w_1
, and X_1
again.
Three, you can better transfer knowledge/uncertainty from previous calibration setups to new setups. Helpful if, for example, a calibration bath proved to be unreliable.
Four, operational applications with near real time temperatures. Since you're continuously updating p_sol
and p_cov
with only a few observations, the computation is relatively small and you could estimate the temperature quite quickly.
If the solution is continuously updated, would this mean then that the first estimates are less reliable than later estimates?
Would this also mean that dtscalibration would need a method for reading/writing solutions/covariances? For instance in reference to point 4, we don't keep a persistently running processing script but run it regularly using a trigger (e.g., every hour) in order to update our server. Being able to start a new instance of calibration using previously found p_sol_1 and p_cov_1 would be useful in that scenario.
Yes, the parameter estimates improve with each new instance of calibration. I think the case you're describing would definitely benefit from it. It would be interesting to see how the parameter uncertainties reduce after several calibration instances. I would think that the uncertainties reduce asymptotically , and after a certain number of calibration instances the uncertainties barely reduce. Note that you could always chose to re-calibrate the first few instances using the final parameter sets and their covariance as prior.
I have the solver ready and it works. It still needs a bit of documentation. Plus, it should be faster than the previous implement solvers while using less memory. But it would require a bit of thinking on how to pass the priors to the calibration functions.
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
p0_sol
, and ap0_cov
as a priori argumentsDescribe alternatives you've considered Fixing parameters works well, but neglecting covariance has downsides.
Additional context Chapter 1 and 2 of John L. Crassidis and John L. Junkins. 2011. Optimal Estimation of Dynamic Systems, Second Edition (Chapman & Hall/CRC Applied Mathematics & Nonlinear Science) (2nd. ed.). Chapman & Hall/CRC.