dtscalibration / python-dts-calibration

A Python package to load raw Distributed Temperature Sensing (DTS) files, perform a calibration, and plot the result.
https://python-dts-calibration.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
32 stars 18 forks source link

Sequential calibration with support for a priori estimates #110

Open bdestombe opened 4 years ago

bdestombe commented 4 years ago

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered Fixing parameters works well, but neglecting covariance has downsides.

Additional context Chapter 1 and 2 of John L. Crassidis and John L. Junkins. 2011. Optimal Estimation of Dynamic Systems, Second Edition (Chapman & Hall/CRC Applied Mathematics & Nonlinear Science) (2nd. ed.). Chapman & Hall/CRC.

klapo commented 4 years ago

The issue of sequential calibration underlies the design decision in pyfocs: we chug through the data sequentially in order to keep the computational overhead low for the calibration. Something to consider is that at a certain point dts data tends to become so large that the computational overhead is unreasonably expensive even within a framework like dask.

bdestombe commented 4 years ago

Hi, yes you are right and I agree with you, but I think there are some use cases that would still benefit from it.

Currently, almost every part of the DTS calibration routine is/can be run by Dask, meaning that you could do most of the calibration on personal computers with limited memory. Except for the (sparse) least-squares routines. They require to have X (coefficient matrix), y (observations), and w (weights) completely loaded into memory, to estimate p_sol the unknown parameters and their covariance p_cov. But our matrices are very tall, we have many more observations than unknown parameters. Thus what we can do, we can take a first selection of observations: y_1, w_1, and X_1. And we obtain a first estimate p_sol_1 and p_cov_1. For our next calibration we use p_sol_1 and p_cov_1 as prior together with the second selection of observations y_2, w_2, and X_2, to obtain p_sol_2 and p_cov_2. We can keep chaining them together until we have used all observations. The final estimates at the end of the chain: p_sol_final and p_cov_final, are the same as when all observations were used at once (p_sol and p_cov).

I see several applications that would benefit from this. First of all, it would reduce the maximum memory needed. Second, you can make a surrogate model, with a limited number of observations to obtain p_sol_1 and p_cov_1, use that to get a first estimate of the temperature. Do your georeferencing or prepare other workflows, after which the calibration can be continued without processing y_1, w_1, and X_1 again. Three, you can better transfer knowledge/uncertainty from previous calibration setups to new setups. Helpful if, for example, a calibration bath proved to be unreliable. Four, operational applications with near real time temperatures. Since you're continuously updating p_sol and p_cov with only a few observations, the computation is relatively small and you could estimate the temperature quite quickly.

klapo commented 4 years ago

If the solution is continuously updated, would this mean then that the first estimates are less reliable than later estimates?

Would this also mean that dtscalibration would need a method for reading/writing solutions/covariances? For instance in reference to point 4, we don't keep a persistently running processing script but run it regularly using a trigger (e.g., every hour) in order to update our server. Being able to start a new instance of calibration using previously found p_sol_1 and p_cov_1 would be useful in that scenario.

bdestombe commented 4 years ago

Yes, the parameter estimates improve with each new instance of calibration. I think the case you're describing would definitely benefit from it. It would be interesting to see how the parameter uncertainties reduce after several calibration instances. I would think that the uncertainties reduce asymptotically , and after a certain number of calibration instances the uncertainties barely reduce. Note that you could always chose to re-calibrate the first few instances using the final parameter sets and their covariance as prior.

I have the solver ready and it works. It still needs a bit of documentation. Plus, it should be faster than the previous implement solvers while using less memory. But it would require a bit of thinking on how to pass the priors to the calibration functions.