Scale data when computing EOFs, or not?

JGCRI / fldgen

Given a global mean temperature pathway, generate random global climate fields consistent with it and with spatial and temporal correlation derived from an ESM

https://jgcri.github.io/fldgen/

GNU General Public License v2.0

12 stars 6 forks source link

Scale data when computing EOFs, or not? #2

Closed rplzzz closed 6 years ago

rplzzz commented 6 years ago

The code currently does not scale the residual data to unit variance, nor, for that matter, does it center it. The reason we chose not to scale is that it slightly complicates the procedure for reconstructing fields from the EOFs. However, conventional wisdom is that scaling is advisable.

Questions:

Does scaling add value to the process sufficient to justify the extra complexity?
What, if any, harm can not scaling introduce?

bkravitz commented 6 years ago

In principle, scaling will deemphasize high latitudes and regions that tend to be associated with natural modes of variability. This matters if you're trying to make sure that you capture certain things within the first few modes. We're not interested in truncating after three modes, so I strongly suspect that it doesn't matter for our purposes. Thankfully, this is an easy thing to check - run it with scaling and without scaling, and compare the two answers. @rplzzz is that something that can be done relatively easily? I'm happy to look at the output that's generated.

rplzzz commented 6 years ago

It's easy enough in principle, but in practice it's kind of a pain. Moreover, absent a quantitative measure of the quality of the results, I'm not sure on what basis we make the decision. I'm going to try to make progress on the other tasks before I tackle this.

CLynchy commented 6 years ago

In the original R code (in test-r), I had scaled the data when computing the EOFs: res_EOFs <-prcomp(resids, retx = TRUE, center = FALSE, scale = TRUE)

As BK has said, this was done because there was a LOT of variance at high latitudes and that dominated the first few EOFs (when not scaled scale=FALSE). If scale=TRUE, then the first few EOFs were the main modes of annual variablity (ie ENSO, NAO, and some PDO-looking thing).

rplzzz commented 6 years ago

I'm not really seeing anything like what @CLynchy described in my results. Here is EOF-1: It's true that the north polar region is strongly represented here, but it looks a bit like the Arctic oscillation to me. The next few EOFs don't really show anything in particular going on at the poles.

In light of that, I'm going to leave out the scaling for now. We can revisit in the next iteration if we feel the need.