why are tas residuals "local variability" and not "local trend"?

mathause commented 2 years ago

"gvtas" (or gv_novolc_T_s or the tas residuals after smoothing/ removing volcanic spikes) is a predictor for the local trend (calibrate_mesmer.py#L289). However, afterwards it is counted towards the local variability (and not the local trend).

I kind of get why this is done - it's not a "trend" - but also a bit confusing & more difficult to code. I wonder if there is a practical difference. E.g. to find the local variability both are subtracted from the actual data:

https://github.com/MESMER-group/mesmer/blob/5a456657147b226d14211aba20f148c34fa881ce/mesmer/calibrate_mesmer/calibrate_mesmer.py#L319

I.e. for this is seems to make no difference. I need to check the creation of emulations - are the tas residuals a forcing for the emulations?

Ok, "coef_gvtas" is part of the "mesmer bundle" but under local variability (bundle["params_lv"]['coef_gvtas']). So I assume (for the moment) it is mostly a semantic difference and not a practical one.

cc @leabeusch

leabeusch commented 1 year ago

& hi also on this at least a tiny bit less long on-GitHub-not-answered-to issue. 😅 To finally put it into writing as well: the reason for this special separation is a mismatch between scientific conceptual needs (wanting to separate “forced response” processes from “internal variability” processes) & computational needs (maximizing the overall statistical fit by using a single multiple regression containing predictors addressing various processes, since worse statistical fits would be obtained if e.g., separate linear regressions were carried out for the different processes). My pragmatic - although not too satisfying – solution was to fit a single multiple linear regression (to fulfil the computational needs) but store the resulting calibrated regression coefficients in process-specific dictionaries (to fulfil the conceptual scientific needs)-> i.e., train_lt() returns params_lt(), which contains the forced response parameters (i.e., the intercepts & the regression coefficients for the forced response), & params_lv(), which contains the regression coefficients for the global variability. The params_lv() dictionary later on gets extended with the parameters needed to statistically model the residual local variability (i.e., parameters for AR(1) processes with spatially correlated innovations) through the train_lv() function (maybe a more suitable name for this function would be: train_residual_lv()?!). & this train_lv() function takes as target entry the residual local variability, which is obtained – as shown in your code excerpt above - by subtracting from tas (here tas_s[scen]) all the temperature anomaly contributions that can be explained with the multiple linear regression (i.e., lt_s[scen][“tas”] for the forced response part and lv_gv_s[scen][“tas”] for the local variability induced by global variability part). When subsequently creating new local variability emulations, the local variability induced by global variability part is driven by the - also stochastically – created global variability emulations. The residual variability part on the other hand doesn’t require additional predictors, since it’s fully stochastically created from its AR(1) parameters.

As you can see, it’s unfortunately messy for a good reason. If you have any ideas for a clearner solution that also fits both conceptional scientific & computational needs, that’d be fantastic @mathause! 😄

Also: just as a reminder, for your new MESMER implementation, I’d really welcome a name change from “trend” to “forced_response” (since you made me aware that “trend” is a misleading name for what it represents 😉). This renaming already took place within the papers (both the MAGICC-MESMER coupling & the major emitters study use the term “forced response”) but I never introduced it into the code (& I think you haven’t either so far?).

mathause commented 1 year ago

Thanks!

If you have any ideas for a cleaner solution that also fits both conceptional scientific & computational needs

Unfortunately I don't - so far.

I’d really welcome a name change from “trend” to “forced_response”

Good point and this has indeed not been done so far.

When creating emulations did you usually use the tas residuals as forcing?

mathause commented 1 year ago

So the "new" approach is currently:

predictors = {
    "tas_globmean": [tas_hist_globmean_smooth_volc, tas_proj_smooth],
    "tas_globmean_resid": [tas_hist_resid_novolc.tas, tas_proj_resid.tas],
}

lr = mesmer.stats.linear_regression.LinearRegression()

lr.fit(
    predictors=predictors,
    target=tas_mu.tas,
    dim="time",  # switch to sample?
)

This will have to become:

pred_local_trend = {
    "tas_globmean": [tas_hist_globmean_smooth_volc, tas_proj_smooth],
}

pred_local_var = {
    "tas_globmean": [tas_hist_globmean_smooth_volc, tas_proj_smooth],
    "tas_globmean_resid": [tas_hist_resid_novolc.tas, tas_proj_resid.tas],
}

mesmer.calibrate.lin_reg_local(
    local_trend=pred_local_trend,
    local_var=pred_local_var,
    target=tas_mu.tas,
    dim="time",
)

(where the func name has to be defined.

MESMER-group / mesmer

why are tas residuals "local variability" and not "local trend"? #208