weighting of scenarios?

mathause commented 4 days ago

I always thought that the scenario weights applied to the linear regression is given by 1 / (n_ens * n_ts). However it's 1 / n_ens. I probably miss-interpreted this. The original code (v0.8.0) is here:

https://github.com/MESMER-group/mesmer/blob/13f048b1106faf302755a6181358243b43fffb5b/mesmer/calibrate_mesmer/train_utils.py#L50-L58

I refactored this in #143 and adapted the comment to

https://github.com/MESMER-group/mesmer/blob/456776d4a318e50bc7f642f097354c76c24a21fc/mesmer/calibrate_mesmer/train_utils.py#L14-L15

but importantly the code stayed the same:

https://github.com/MESMER-group/mesmer/blob/456776d4a318e50bc7f642f097354c76c24a21fc/mesmer/calibrate_mesmer/train_utils.py#L39

From Beusch et al. (2022):

"To obtain robust MESMER parameter estimates for each ESM, MESMER is trained on all available ensemble members of each available scenario and equal weight is given to each scenario."

I think it's not 100% clear - you could argue that the historical scenario does get a bit more weight as it has more time steps. But saying the weight is 1 / n_ens is a just-as-valid interpretation of "equal weight for each scenario". So in conclusion there is nothing to do here (except maybe to adapt my comment).

Originally commented in https://github.com/MESMER-group/mesmer/pull/567#pullrequestreview-2464567678

edit: corrected n_scen -> n_ens

veni-vidi-vici-dormivi commented 4 days ago

Shouldn't n_scen be n_ens or n_runs? Or do you actually mean n_scen because if you were to weigh each sample by 1/n_scen scenarios with more members would be overrepresented.

mathause commented 4 days ago

Shouldn't n_scen be n_ens or n_runs? Or do you actually mean n_scen because if you were to weigh each sample by 1/n_scen scenarios with more members would be overrepresented.

Yes you are right - I mean n_ens. I'll correct it above.

MESMER-group / mesmer

weighting of scenarios? #569