ajdawson / eofs

EOF analysis in Python
http://ajdawson.github.io/eofs/
GNU General Public License v3.0
199 stars 60 forks source link

Multivariate EOF Variance Discrepancies #121

Closed Boooke closed 3 years ago

Boooke commented 3 years ago

Hey, first off thanks for developing the eofs package, it has helped me out alot with performing univariate EOFs.

So this may very well be my lack of theoretical understanding, please forgive me if it is. But when performing MEOFs I wanted to see how the variance explained was distributed among the correspondingly reconstructed fields, and it seems off to me. I tried comparing with both anomaly fields and also standardised data before plugging them into the solver.

I compared the outputs of the two final commands below, which is done specifically on standardised data so that the SVD's variance fraction and the constructed data's variance should be equal.

# mean = 0, std = 1 for each dataarray
m_solver = MultivariateEof(list_data_arrays, weights=list_wgts)

# These numbers don't compare with standardised data
var_fraction = numpy.sum(m_solver.varianceFraction(neigs=n))

reconstructed_var = 0
for i in range(0, N_vars):
  reconstructed_var += numpy.nanvar(m_solver.reconstructedField(n)[i])
reconstructed_var /= N_vars 

# Should be around 0
var_fraction - reconstructed_var

I have three datasets which I have done this with, comparing in any combination of the three.

If this is the wrong output, could it be the way the MEOF is computed? The few papers I have found on MEOFs stack the datasets vertically, along the time axis, which is the opposite of what the MultivariateEOFs object does as far as I could tell.

Regards, Boooke

Boooke commented 3 years ago

It was probably due to my lack of theoretical understanding. I noticed that in case, with increased features concatenated together the variance decreases, I guess because the data deviates less from the expected value with similar features added.