PowerGridModel / power-grid-model

Python/C++ library for distribution power system analysis
Mozilla Public License 2.0
143 stars 30 forks source link

[FEATURE] *Validity Check and Bad Data Handling in State Estimation* #641

Open sudo-ac opened 3 months ago

sudo-ac commented 3 months ago

The new feature should introduce a function to calculate the expected value $E$ and the actual value of the minimization function $J(x)$. If $J(x)$ lies within the 3-sigma band, the result of the state estimation is considered valid. Additionally, bad data should not be included in the determination of $J(x)$, or should only be considered with an adjusted weight in the calculation of $J(x)$.

Detailed Description:

  1. Calculation of Expected Value $E$ and Variance of $E$:

    • The function should calculate the expected value $E$ as $E = m - n$, where:
      • $m$ is the number of measurements.
      • $n$ is the number of state variables ( = 2 * Number of Busses - Slack )
    • The variance $\sigma_J^2$ should be calculated as $\sigma_J^2 = 2(m - n)$. so: $\sigma_J = sqrt(2*(m-n))$
  2. Calculation of Minimization Function $J(x)$:

    • The function should compute ( J(x) ) as the sum of the weighted squared residuals:

      $J(x) = \sum_{i=1}^{m} \left( \frac{r_i}{\sigma_i} \right)^2$

    • The residual $r_i$ is defined as $r_i = z_i - h_i(x)$, where $z_i$ is the measurement and $h_i(x)$ is the predicted value based on the state $x$.

  3. Validity Check within the 3-Sigma Band:

    • Check if $J(x)$ lies within the 3-sigma band: $E - 3\sigma_J < J(x) < E + 3\sigma_J$.
    • If $J(x)$ lies within this range, the result of the state estimation is considered valid.
  4. Handling Bad Data:

    • Bad data can be detected by evaluating the residual $r_i$. If $|r_i|$ exceeds a threshold, the data is considered bad. The threshold can be defined as follows: $|r_i| > \alpha \cdot \sigma_i $, where $\alpha$ is typically in the range of 3 to 5.
    • Bad data should be excluded from the calculation of $J(x)$ or included with an adjusted weight $w$. The adjusted weight can be calculated as:

      $w_i^{\text{new}} = \frac{w_i}{1 + \alpha \cdot \frac{r_i^2}{\sigma_i^2}}$

sudo-ac commented 3 months ago

see also #377

TonyXiang8787 commented 3 months ago

Hi @sudo-ac,

This is an interesting proposal. If I understand correctly, this is the normalized residual test, right?

Your expected value $E$ is the number of over-determined measurements in the system.

sudo-ac commented 3 months ago

Thank you for your interest. Yes, you are correct, this is the normalized residual test. The expected value, as mentioned, is indeed the number of over-determined measurements in the system. This approach helps in validating the state estimation by ensuring that the computed residuals fall within the acceptable range defined by the 3-sigma band. Additionally, handling bad data appropriately will enhance the accuracy and reliability of the state estimation.

TonyXiang8787 commented 2 months ago

Dear @sudo-ac,

Thanks for your input. We are considering implementing this in the following order:

  1. Try your method in some real-world state estimation study cases in research scripts. Evaluate the adjust the mathematics.
  2. Implement this bad data detection in the Python side of the PGM.
  3. Move the implementation to C++ core.

I have two questions for you still.

Formula

Can you have a look again on the formulas you proposed? Some of them do not make completely sense. For example, $$E - 3 \sigma_J$$ is always negative. So the left comparison $$E - 3 \sigma_J < J(x) $$ always holds? Also, when you calculate the $$ J(x)$$, do you actually need to put a square root?

Please have a check on the formulas and make the adjustment if needed.

Numerical example

Can you give us some numerical examples to show how your algorithm will work?

sudo-ac commented 2 months ago

Yes, but please give me some time to examine an example.

sudo-ac commented 2 months ago

Now I see your problem. Here is an example:

First of all, there was a mistake in the calculation of the variance. The variance should be calculated as follows: $\sigma^2 = 2 \times (m - n) $

Bus 1-3: U = 12.5 kV

    1*                2 
G1->|--x-----L_12--x-|<-G2
    x               x
     .             .
       L_13    L_24
         .      .
          x___x->L1
            3

Bus 1 = Slack x: Powerflow - Measurement P, Q

3 Nodes, 3 Branches

Measurements $m$:

3Voltage (Bus1 - Bus3) + 6 2 (P, Q) + 3 * 2 (P, Q-Measurement for Generation at Bus 1 and 2, Load at Bus 3) = 21 $m = 3 + 12 + 6 = 21$

State Variables $n$:

$n = 2 \times \text{Nodes} - \text{Slack} = 2 \times 3 - 1 = 5$

Expected Value $E$:

$E = m - n = 21 - 5 = 16$

Variance $\sigma^2$:

$\sigma^2 = 2 \times (m - n) = 32$ $\sigma = \sqrt{32} \approx 5.66$

Confidence Interval: $3 \times \sigma = 3 \times 5.66 = 16.98$ -> This results in a negative value when subtracted from the expected value:

$16 - 16.98 = -0.98$

Therefore, one must use the chi-squared distribution.

from scipy.stats import chi2

# Degrees of Freedom
df = 16

# Confidence Level 99.8% (3 sigma band)
alpha = 0.002  # 1 - 0.998

# Calculation of critical values
critical_value_lower = chi2.ppf(alpha / 2, df)
critical_value_upper = chi2.ppf(1 - alpha / 2, df)

print(f"Lower critical value: {critical_value_lower}")
print(f"Upper critical value: {critical_value_upper}")

gives:

Lower critical value: 3.9416278434807315
Upper critical value: 39.252354790768464

$J(x) = \sum_{i=1}^{m} \left( \frac{r_i}{\sigma_i} \right)^2$

should be in the interval [3.94, 39.25].

Conclusion: The given equation for the variance is only valid if the number of degrees of freedom $E$ is greater than 30 ( the chi-squared distribution approximates a normal distribution if $E > 30$, see E. Handschin "Elektrische Energie Übertragungssystem" Page 175ff). This condition is typically met for practical grids. For smaller systems, the chi-squared distribution must be used.