Closed MiguelMValero closed 3 months ago
Hi Miguel,
Following our discussion this morning: The small bit of equation $\left( \Sigma _{i=0} ^{nV} \left( \bf{y}_i - \mathcal{H}\bf{x}_i ^f \right)^2 \right)^{1/2}$ is the definition of the so called $L_2$ norm or euclidean, or either: the canonical distance in between the observation $\bf{y}$ and the model forecast in this $nV$-dimensional space.
It can be that the $L_2$ norm may (or not) be the best choice. It will be interesting to compute some other kind of error based on $Lp$ norms. use $\left( \Sigma {i=0} ^{nV} \left( \bf{y}_i - \mathcal{H}\bf{x}_i ^f \right)^p \right) ^ {1/p}$ of course the normalization can be also done using the same $p$ at denominator.
Nota bene: $p \in \mathbb{N}$ but can be also $p \in \mathbb{Q}$
Normally, when dealing with EnKF based-tools techniques, one of the fundamentals aspects to quantify the synchronization between the simulation and the high-fidelity data is the estimation of an error $NRMSD$ based on $NRMSD \propto \left(\mathbf{y}_i - \mathcal{H}\mathbf{x}^f_i \right)$.
However, how to define this error? We should probably use different expressions depending on the test case and the order of magnitude of the different variables being observed... being quite difficult to homogenize this definition.
My proposal is to define a different error for the different variables in play, that is, $U_x$, $U_y$, $U_z$, $p$, and $C_f$. My definition of $NRMSD$ is based on: https://en.wikipedia.org/wiki/Root-mean-square_deviation but with some differences. So, for the number of observations of each variable $nV$:
$$NRMSD{nV} = \frac{1}{nV} \sqrt{\frac{\displaystyle\sum{i = 1} ^{nV} \left(\mathbf{y}_i - \mathcal{H}\mathbf{x}i^f \right)^2}{\displaystyle\sum{i = 1} ^{nV} \left(\mathbf{y}_i^2 + \varepsilon \right)}}$$
$\varepsilon = 10^{-9}$ to make sure there are no zero values at the denominator.