sudo-ac commented 3 months ago

The new feature should introduce a function to calculate the expected value $E$ and the actual value of the minimization function $J(x)$. If $J(x)$ lies within the 3-sigma band, the result of the state estimation is considered valid. Additionally, bad data should not be included in the determination of $J(x)$, or should only be considered with an adjusted weight in the calculation of $J(x)$.

Detailed Description:

Calculation of Expected Value $E$ and Variance of $E$:
- The function should calculate the expected value $E$ as $E = m - n$, where:
  - $m$ is the number of measurements.
  - $n$ is the number of state variables ( = 2 * Number of Busses - Slack )
- The variance $\sigma_J^2$ should be calculated as $\sigma_J^2 = 2(m - n)$. so: $\sigma_J = sqrt(2*(m-n))$
Calculation of Minimization Function $J(x)$:
- The function should compute ( J(x) ) as the sum of the weighted squared residuals:
  
  $J(x) = \sum_{i=1}^{m} \left( \frac{r_i}{\sigma_i} \right)^2$
- The residual $r_i$ is defined as $r_i = z_i - h_i(x)$, where $z_i$ is the measurement and $h_i(x)$ is the predicted value based on the state $x$.
Validity Check within the 3-Sigma Band:
- Check if $J(x)$ lies within the 3-sigma band: $E - 3\sigma_J < J(x) < E + 3\sigma_J$.
- If $J(x)$ lies within this range, the result of the state estimation is considered valid.
Handling Bad Data:
- Bad data can be detected by evaluating the residual $r_i$. If $|r_i|$ exceeds a threshold, the data is considered bad. The threshold can be defined as follows: $|r_i| > \alpha \cdot \sigma_i $, where $\alpha$ is typically in the range of 3 to 5.
- Bad data should be excluded from the calculation of $J(x)$ or included with an adjusted weight $w$. The adjusted weight can be calculated as:
  
  $w_i^{\text{new}} = \frac{w_i}{1 + \alpha \cdot \frac{r_i^2}{\sigma_i^2}}$

$w_i = \frac{1}{\sigma_i^2}$

sudo-ac commented 3 months ago

see also #377

TonyXiang8787 commented 3 months ago

Hi @sudo-ac,

This is an interesting proposal. If I understand correctly, this is the normalized residual test, right?

Your expected value $E$ is the number of over-determined measurements in the system.

sudo-ac commented 3 months ago

Thank you for your interest. Yes, you are correct, this is the normalized residual test. The expected value, as mentioned, is indeed the number of over-determined measurements in the system. This approach helps in validating the state estimation by ensuring that the computed residuals fall within the acceptable range defined by the 3-sigma band. Additionally, handling bad data appropriately will enhance the accuracy and reliability of the state estimation.

TonyXiang8787 commented 2 months ago

Dear @sudo-ac,

Thanks for your input. We are considering implementing this in the following order:

Try your method in some real-world state estimation study cases in research scripts. Evaluate the adjust the mathematics.
Implement this bad data detection in the Python side of the PGM.
Move the implementation to C++ core.

I have two questions for you still.

Formula

Can you have a look again on the formulas you proposed? Some of them do not make completely sense. For example, $$E - 3 \sigma_J$$ is always negative. So the left comparison $$E - 3 \sigma_J < J(x) $$ always holds? Also, when you calculate the $$ J(x)$$, do you actually need to put a square root?

Please have a check on the formulas and make the adjustment if needed.

Numerical example

Can you give us some numerical examples to show how your algorithm will work?

sudo-ac commented 2 months ago

Yes, but please give me some time to examine an example.

sudo-ac commented 2 months ago

Now I see your problem. Here is an example:

First of all, there was a mistake in the calculation of the variance. The variance should be calculated as follows: $\sigma^2 = 2 \times (m - n) $

Bus 1-3: U = 12.5 kV

    1*                2 
G1->|--x-----L_12--x-|<-G2
    x               x
     .             .
       L_13    L_24
         .      .
          x___x->L1
            3

Bus 1 = Slack x: Powerflow - Measurement P, Q

3 Nodes, 3 Branches

Measurements $m$:

3Voltage (Bus1 - Bus3) + 6 2 (P, Q) + 3 * 2 (P, Q-Measurement for Generation at Bus 1 and 2, Load at Bus 3) = 21 $m = 3 + 12 + 6 = 21$

State Variables $n$:

$n = 2 \times \text{Nodes} - \text{Slack} = 2 \times 3 - 1 = 5$

Expected Value $E$:

$E = m - n = 21 - 5 = 16$

Variance $\sigma^2$:

$\sigma^2 = 2 \times (m - n) = 32$ $\sigma = \sqrt{32} \approx 5.66$

Confidence Interval: $3 \times \sigma = 3 \times 5.66 = 16.98$ -> This results in a negative value when subtracted from the expected value:

$16 - 16.98 = -0.98$

Therefore, one must use the chi-squared distribution.

from scipy.stats import chi2

# Degrees of Freedom
df = 16

# Confidence Level 99.8% (3 sigma band)
alpha = 0.002  # 1 - 0.998

# Calculation of critical values
critical_value_lower = chi2.ppf(alpha / 2, df)
critical_value_upper = chi2.ppf(1 - alpha / 2, df)

print(f"Lower critical value: {critical_value_lower}")
print(f"Upper critical value: {critical_value_upper}")

gives:

Lower critical value: 3.9416278434807315
Upper critical value: 39.252354790768464

$J(x) = \sum_{i=1}^{m} \left( \frac{r_i}{\sigma_i} \right)^2$

should be in the interval [3.94, 39.25].

Conclusion: The given equation for the variance is only valid if the number of degrees of freedom $E$ is greater than 30 ( the chi-squared distribution approximates a normal distribution if $E > 30$, see E. Handschin "Elektrische Energie Übertragungssystem" Page 175ff). This condition is typically met for practical grids. For smaller systems, the chi-squared distribution must be used.

PowerGridModel / power-grid-model

[FEATURE] Validity Check and Bad Data Handling in State Estimation #641

Formula

Numerical example

PowerGridModel / power-grid-model

[FEATURE] *Validity Check and Bad Data Handling in State Estimation* #641

Formula

Numerical example

[FEATURE] Validity Check and Bad Data Handling in State Estimation #641