MESMER-group / mesmer

spatially-resolved ESM-specific multi-scenario initial-condition ensemble emulator
https://mesmer-emulator.readthedocs.io/en/latest/
GNU General Public License v3.0
23 stars 18 forks source link

Bounds to AR1 process in MESMER-M #472

Closed veni-vidi-vici-dormivi closed 3 months ago

veni-vidi-vici-dormivi commented 4 months ago

In MESMER-M we estimate the parameters of the AR1 process to generate local temporally correlated variability. In principle, this is similar as in MESMER, except that we want AR parameters for each month individually. This results from the assumption that June might depend on May differently than July does on June. As a consequence we need to implement an in-house fitting method. Which looks like this:

def _fit_autoregression_monthly_np(data_month, data_prev_month):

    def lin_func(x, a, b):
        return a * x + b

    slope, intercept = curve_fit(lin_func, 
                     data_prev_month, # independent variable
                     data_month, # dependent variable
                     bounds=([-1,-np.inf], [1, np.inf]))[0]

    return slope, intercept

We fit a linear function using each value of the previous month as predictor for the next month. Herein, Shruti set the bounds for the slope to [-1, 1]. This is needed because when we later adjust the covariance we do $\Sigma_{adjusted} = \sqrt{1- \gamma_i^2} \cdot \sqrt{1- \gammaj^2} \cdot \Sigma{empirical}$ with $\gamma$ being the AR slope at each gridpoint and thus $\gamma \in [-1, 1]$. And indeed, without this bound, some slopes lie outside this interval.

Now, I don't see this restriction in MESMER. Also, in our example, it does not happen that the slope goes outside of this interval. Now my question is, should we worry about these slopes in MESMER-M? I still don't completely understand the rationale behind the adjustment so I'm sceptical about enforcing these bounds uninformed.

mathause commented 4 months ago

I assume we should not add bounds but it should be normalized (https://en.wikipedia.org/wiki/Autocorrelation#Normalization) but I could not find it in the sklearn code. Can you compare the output of your manual function with AutoReg(data, lags=lags).fit().params?

Can you ask Lukas about this - I suspect he'll know.

https://github.com/MESMER-group/mesmer/blob/6a30c2781206e240d7186e09544afc5e72abeed5/mesmer/stats/_auto_regression.py#L584

veni-vidi-vici-dormivi commented 4 months ago

How would you compare to the result of AutoReg? Or actually what would you want to be data? When I take the whole time series i.e. all months as data I get one coefficient and one intercept instead of twelve that I do now. When I give AutoReg a dataset containing months pairwise so say June and July, it will not only take the June values to predict July but also July values to predict June and mix it up into coefficients. So I'm not sure what to do here.

mathause commented 4 months ago

I meant the most simple example - an example array and an auto regression with lag 1 - not the whole month-by-month circus. If we can get the same coefficients that's a good indication that our approach is good. Does that make sense?

mathause commented 4 months ago
import numpy as np

from statsmodels.tsa.ar_model import AutoReg
from sklearn.linear_model import LinearRegression

arr = np.random.randn(100)

for i in range(1, arr.size):
    arr[i] = arr[i] + 0.7 * arr[i -1]

# via AutoReg
AR_model = AutoReg(arr, lags=1)
AR_result = AR_model.fit()

# via LinearRegression
reg = LinearRegression(fit_intercept=True)
reg.fit(X=arr[:-1].reshape(-1, 1), y=arr[1:])
LR_params = np.hstack((reg.intercept_, reg.coef_))

print(f"AR: {AR_result.params}")
print(f"LR: {LR_params}")

And it seems they yield the same result...

mathause commented 4 months ago

And here is the code using curve_fit:

import scipy as sp

def lin_func(x, a, b):
    return a * x + b

sp.optimize.curve_fit(lin_func, arr[:-1], arr[1:])[0]

Maybe curve_fit was used because bounds can be added? Thinking a bit more about this: an AR process can have coefficients $\gt 1$, however, if this is the case, there is runaway effect - after 100 time steps the values will be huge. As there are other months in between, it may be possible here. Would be interesting to get some example data where the AR coeffs exceed 1. Does that mean the seasonal cycle was not properly removed in this case?

arr = np.random.randn(100)
for i in range(1, arr.size):
    arr[i] = arr[i] + 1.2 * arr[i -1]

Interestingly:


%timeit AutoReg(arr, lags=1).fit()
%timeit LinearRegression(fit_intercept=True).fit(X=arr[:-1].reshape(-1, 1), y=arr[1:])
%timeit sp.stats.linregress(arr[:-1], arr[1:])
%timeit sp.optimize.curve_fit(lin_func, arr[:-1], arr[1:])[0]

Also, for lag > 1:

X = np.lib.stride_tricks.sliding_window_view(arr, 2)[:, ::-1][:, 1:]
mathause commented 4 months ago

Also AutoRegression is a wrong name here because it's not regressed against itself. So is _adjust_ecov_ar1_np even appropriate?

veni-vidi-vici-dormivi commented 4 months ago

there is runaway effect - after 100 time steps the values will be huge. As there are other months in between, it may be possible here. Would be interesting to get some example data where the AR coeffs exceed 1.

For our coarse dataset there are two gridpoints where one month each yields a coeff > 1. But because the coefficients are only marginally above one and following month has again a slope smaller than 1, there is no runaway effect in this case... In general, for the deterministic predictions we end up with only the intercepts actually determining the values because the product of the previous value and the coefficients is too small.

Does that mean the seasonal cycle was not properly removed in this case?

Hm why do you think so?

Also AutoRegression is a wrong name here because it's not regressed against itself. So is _adjust_ecov_ar1_np even appropriate?

That's also what I wonder... I'll consult the smart book Lukas gave me. Will be back on Wednesday.

veni-vidi-vici-dormivi commented 4 months ago

Does that mean the seasonal cycle was not properly removed in this case?

Hm why do you think so?

Ah now I get it. Let me think about that a bit more

veni-vidi-vici-dormivi commented 4 months ago

Also _Auto_Regression is a wrong name here because it's not regressed against itself. So is _adjust_ecov_ar1_np even appropriate?

It's a seasonal Autoregression. Can you send me the Frei paper you were talking about concerning the adjustment of the covariance matrix?

mathause commented 4 months ago

Going back to the bounds: we can also create an artificial time series with a monthly AR1 process and see how wrong the standard deviation estimate is. To check whether we really need to fix the standard deviation or if it's actually ok (in contrast to the "normal AR1 process").

veni-vidi-vici-dormivi commented 4 months ago

Ah very good point, I'll try that.

veni-vidi-vici-dormivi commented 4 months ago

Okayy. I learnt a lot about this yesterday and I'll try to wrap it up neatly here.

Step one: Fitting of the AR parameters

So what we do at the moment is fit an AR process with parameters for each month and a lag of one. The underlying assumption is that each month depends linearly on the previous month, but each month in with different parameters. The book "Statistical Analysis in Climate Research" by Stroch and Zwiers (1999, reprinted 2003, page 209 and 210) calls this a seasonal AR(p) process or also cycle-stationary AR(p) process. It is defined there as follows:

$$ X{t, \tau} = \alpha{0, \tau} + \sum{k=1}^p \alpha{k,\tau} X{t, \tau-k} + Z{t, \tau} $$

here $\tau = 1...N$ denotes the "season", in our case the month, and $t$ denotes the repititions of the external cycle, in our case the years. $\mathbf{Z}$ is a white noise process. Such a seasonal AR process is characterized by cycles of length N of the mean, variance and the auto-covariance function. Meaning that not the overall timeseries has to be stationary ($\approx$ constant moments) but the monthly series need to. Thus, the overall process can be stable without all the $|\alpha_k|$ being smaller than one!

Thus, the first learning is we should not put bounds of -1, 1 on our process!

What we mean with a seasonal AR process is that each month/season depends on the previous month/season with month/season dependent parameters. This for me makes intuitive sense for me for this application. Each July depends in a specific way on the previous June.

Interestingly, the vast majority of the internet does NOT mean what we mean here with a seasonal AR process. Instead, what is often denoted as a seasonal AR process is that the value of a season depends on the value of the same season in the last year. I.e. the december sales will be similar to the december sales last year. This approach is tightly connected to moving average processes and seasonal differencing, all combined into the SARIMA model class.

Everything I could find on seasonal AR processes was about SARIMA models. I understand too little about it to judge how useful they are for our case but what I can say is that in the long run it could be worth a try to check it out. I do think a SARIMA approach could achieve what we want. Nevertheless, i opted to not go into that further because I am also sure that it cannot directly replicate what we do at the moment, specifically it cannot produce parameters specific to each month. It would be quite a huge change in the method and a lot of work to implement because you need to do a lot of scoping to find the right parameters (order of the AR processes, the seasonal differencing and moving average).

Add on here: googling cyclostationary AR processes yields wildly different results but also much less results overall, most of them in the realm of signal processing. The literature either seems pretty old or too specific to other problems so that I also abandoned looking into that further.

So, I would opt for keeping our approach and removing the bounds. And maybe renaming the functions to x_cyclo_stationary_AR1. the current _monthly just seems a bit odd but we could also keep it. I would want to avoid seasonal_AR1 since people might think we use the SARIMA approach.

Step 2: finding the driving white noise parameters (covariance)

After fitting our seasonal AR(1) process we compute the deterministic part of it (leaving out the white noise) and fit the covariance matrix on the residuals after removing this deterministic part from our data. After we found the best localisation radius and corresponding covariance matrix we adjust the covariance matrix as we would do for a normal AR process:

$$ \Sigma{\nu\ \tau,i,j} = \sqrt{1-\alpha{1,\tau,i}} \cdot \sqrt{1-\alpha{1,\tau,j}} \Sigma{\eta\ \tau, i, j} $$

this stems from the fact that for a AR(1) process only in time:

$$ Var(Xt) = \frac{\sigma^2{Z}}{1 - \alpha_1^2} $$

So the variance of the time series $Var(\mathbf{X}_t)$ (which we observe) is bigger than the variance of the driving white noise process $\sigma^2_Z$ (which we need). For the "normal" AR(1) process this is very nicely analytically doable.

However, for our seasonal AR process this is less nice:

$$ Var(X{t, \tau}) = \sum{j=1}^\infty \beta{j, \tau}^2 \sigma{Z, \tau - j + 1} ^{\tau -j + 1} $$

where $\beta$ is a function of the AR parameters $\alpha_{k,\tau}$ derived from transforming (1) into a infinite moving average process (I don't fully understand the derivation of that one). It is possible that with more mathematical wit than I have access to one could simplify this for a seasonal AR(1) process but my street wit tells me that if that was the case, they would have written it in the book.

Note that the adjustment we are doing now is not suitable for cases where the AR parameters are bigger than one which we established they can be. Moreover, fittting the covariance matrix to the residuals of the deterministic AR process without noise is useless imo because it comes out to be not more than substracting a constant for each month, not changing anything about the variance of the data.

So my conclusion is that the adjustment we are doing is wrong but we can't come up with an equally practical adjustment factor for our case.

So what can we do? The internet tells me that we can find the variance by taking the variance of the residuals from the AR fit. Thereby the residuals are computed using the actual data as predictors, see below.

Example

I have put together an example that demonstrates the approach:

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

I generate data with seasonality (including AR params > 1) and a uniform noise component with a std of 2:

n_ts = 1200
buffer = 12*2
intercept = np.arange(1, 13)
slope = np.linspace(-1.2,1.2,12)
np.random.seed(0)
out = np.zeros(n_ts+buffer)
for t in range(1, n_ts + buffer):
    month = t % 12
    out[t] = (
        intercept[month]
        + slope[month] * out[t - 1]
        + np.random.normal(0, 2)
    )
out = out[buffer:]

I fit the AR params:

def lin_func(indep_var, slope, intercept):
    return slope * indep_var + intercept
slopes = np.zeros(12)
intercepts = np.zeros(12)

for month in range(12):
    data_month = out[month::12]
    data_prev_month = out[month - 1::12]
    slopes[month], intercepts[month] = sp.optimize.curve_fit(
        lin_func,
        data_prev_month,  # independent variable
        data_month,  # dependent variable
        # bounds=([-1, -np.inf], [1, np.inf]),
    )[0]

I calculate the residuals using the fitted parameters and the emirical data:

residuals = np.zeros(n_ts)
for month in range(12):
    residuals[month::12] = out[month::12] - lin_func(out[month - 1::12], slopes[month], intercepts[month])

For demonstration purposes I calculate the deterministic AR process:

deterministic = np.ones(n_ts + buffer)
for t in range(1, n_ts + buffer):
    month = t % 12
    deterministic[t] = (
        intercept[month]
        + slope[month] * deterministic[t - 1]
    )
deterministic = deterministic[buffer:]

Plot the results:

fig, ax = plt.subplots()
for m in range(12):
    ax.scatter(m, np.std((out)[m::12]), color = "green", marker = ".", label = "std of data")
    ax.scatter(m, np.abs(slope[m]), color = "red", marker = ".", label = "|AR slopes|")

    ax.scatter(m, np.sqrt(1-slope[m]**2)*np.std((out)[m::12]), color = "blue", marker = "o", label = "adjusted std")
    ax.scatter(m, np.sqrt(1-slope[m]**2)*np.std((out-deterministic)[m::12]), color = "lightblue", marker = "x", label = "adjusted std (wo AR)")
    ax.scatter(m, np.std(residuals[m::12]), color = "purple", marker = "s", facecolors = "none", label = "std of residuals")

ax.hlines(2, 0, 12)

handles, labels = ax.get_legend_handles_labels()
unique = [(h, l) for i, (h, l) in enumerate(zip(handles, labels)) if l not in labels[:i]]
unique_handles, unique_labels = zip(*unique)
ax.legend(unique_handles, unique_labels)

image

What we see is that the std of the data (green dots) seems to scale with the magnitude of the AR parameters (red dots). Note that the blue line marks the actual std of the driving white noise. The adjusted std using the old method (blue dots) actually behaves quite well for small AR parameters but becomes worse for AR parameters closer to -1 and 1 and yield nan for parameters >1. We also see that removing the deterministic part doesn't change the std (light blue crosses). The std using the residuals however performs pretty well. I am not sure what exactly is happening for month 0 though...

Thus, I conclude that we should remove the bounds and estimate the covariance matrix on the residuals of the AR process. And keep in mind the SARIMA stuff for future development.

mathause commented 4 months ago

Very nice summary - thanks! The estimated AR1 params are astonishingly close to the real ones... Month 0 does look curious - double check with other numbers? Could there still be an bug somewhere.

veni-vidi-vici-dormivi commented 3 months ago

Okay found the bug. Takes a bit more thinking to get handling the first month right but I got it now: The beginning stays the same

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

n_ts = 1200
buffer = 12*2
intercept = np.arange(1, 13)
slope = np.linspace(-1.2,1.2,12)
np.random.seed(0)

out = np.zeros(n_ts+buffer)
for t in range(1, n_ts + buffer):
    month = t % 12
    out[t] = (
        intercept[month]
        + slope[month] * out[t - 1]
        + np.random.normal(0, 2)
    )
out = out[buffer:]

Then we need to handle the fact that the first value has no previous value properly.

def lin_func(indep_var, slope, intercept):
    return slope * indep_var + intercept
slopes = np.zeros(12)
intercepts = np.zeros(12)

for month in range(12):
    if month == 0:
        data_month = out[0::12][1:]
        data_prev_month = out[11::12][:-1]
    else:
        data_month = out[month::12]
        data_prev_month = out[month-1::12]

    slopes[month], intercepts[month] = sp.optimize.curve_fit(
        lin_func,
        data_prev_month,  # independent variable
        data_month,  # dependent variable
        #bounds=([-1, -np.inf], [1, np.inf]),
    )[0]

And the same for the predictions, also set the first residual to 0 where we have no predictor data.

predicted = np.zeros(n_ts)
for month in range(12):
    if month == 0:
        prev_month_data = out[11::12]
        predicted[12::12] = lin_func(prev_month_data, slopes[month], intercepts[month])[:-1]
    else:
        prev_month_data = out[month - 1::12]
        predicted[month::12] = lin_func(prev_month_data, slopes[month], intercepts[month])
residuals = out - predicted
residuals[0] = 0
deterministic = np.ones(n_ts + buffer)
for t in range(1, n_ts + buffer):
    month = t % 12
    deterministic[t] = (
        intercept[month]
        + slope[month] * deterministic[t - 1]
    )
deterministic = deterministic[buffer:]

Looks even better now:

fig, ax = plt.subplots()
for m in range(12):
    ax.scatter(m, np.std((out)[m::12]), color = "green", marker = ".", label = "std of data")
    ax.scatter(m, np.abs(slope[m]), color = "red", marker = ".", label = "|AR slopes|")
    ax.scatter(m, np.abs(slopes[m]), color = "pink", marker = "x", label = "|estim. AR slopes|")

    ax.scatter(m, np.sqrt(1-slope[m]**2)*np.std((out)[m::12]), color = "blue", marker = "o", label = "adjusted std")
    ax.scatter(m, np.sqrt(1-slope[m]**2)*np.std((out-deterministic)[m::12]), color = "lightblue", marker = "x", label = "adjusted std (wo AR)")
    ax.scatter(m, np.std(residuals[m::12]), color = "purple", marker = "s", facecolors = "none", label = "std of residuals")

ax.hlines(2, 0, 12)

handles, labels = ax.get_legend_handles_labels()
unique = [(h, l) for i, (h, l) in enumerate(zip(handles, labels)) if l not in labels[:i]]
unique_handles, unique_labels = zip(*unique)
ax.legend(unique_handles, unique_labels)

image

For higher slopes the difference between the approaches becomes even bigger as also for higher variance noise. None of the bugs I fixed above are in the code I already implemented.