PTB-M4D / PyDynamic

Python library for the analysis of dynamic measurements
https://ptb-m4d.github.io/PyDynamic/
GNU Lesser General Public License v3.0
25 stars 13 forks source link

Wrong calculation of covariance matrix in `UMC_generic` #305

Closed mgrub closed 1 year ago

mgrub commented 1 year ago

Describe the bug UMC_generic has an error in the calculation of the covariance matrix after the first block. The covariance therefore has a major mismatch to a basic MC implementation if only one (or a few) blocks are evaluated. If the number of blocks is big, this error wears off due to averaging and inherent random fluctuations.

To Reproduce

from PyDynamic.uncertainty.propagate_MonteCarlo import UMC_generic
import numpy as np
import matplotlib.pyplot as plt

# init
x = np.linspace(0,5,num=11)
ux = np.full_like(x, 0.5)
n_runs = 2000

draw_samples = lambda size: np.random.normal(x, ux, size=(size,x.size))
evaluate = lambda X: X + 1

# PyDynamic
y, Uy, _, output_shape = UMC_generic(draw_samples, evaluate=evaluate, runs=n_runs, blocksize=2000, n_cpu=1) 

# naiv Monte Carlo
results = []
for sample in draw_samples(n_runs):
    result = evaluate(sample)
    results.append(result)
y_mc = np.mean(results, axis=0)
Uy_mc = np.cov(results, rowvar=False)

# comparison
fig, ax = plt.subplots(1, 3)
ax[0].plot(y - y_mc)
ax[1].plot(np.diag(Uy) - np.diag(Uy_mc))
ax[2].imshow(Uy - Uy_mc)
plt.show()

# comparison
print("median y - y_mc   (ideal: 0.0): ", np.median(y - y_mc))
print("median Uy - Uy_mc (ideal: 0.0): ", np.median(Uy - Uy_mc))
print("median diag(Uy) / np.diag(Uy_mc) (ideal: 1.0): ", np.median(np.diag(Uy) / np.diag(Uy_mc)))

Exemplary Output:

median y - y_mc   (ideal: 0.0):  -0.014300805445340181
median Uy - Uy_mc (ideal: 0.0):  0.8981734069108699
median diag(Uy) / np.diag(Uy_mc) (ideal: 1.0):  1987.8304933280958

As can be seen, if only one block (of 2000 samples) is executed, the elements of the main diagonal are roughly factor 2000 apart from the basic MC implementation.

Potential Solution Here, the code should normalize by the number of samples in the current block.

Environment (please complete the following information):