MESMER-group / mesmer

spatially-resolved ESM-specific multi-scenario initial-condition ensemble emulator
https://mesmer-emulator.readthedocs.io/en/latest/
GNU General Public License v3.0
24 stars 18 forks source link

weighting of scenarios? #569

Open mathause opened 4 days ago

mathause commented 4 days ago

I always thought that the scenario weights applied to the linear regression is given by 1 / (n_ens * n_ts). However it's 1 / n_ens. I probably miss-interpreted this. The original code (v0.8.0) is here:

https://github.com/MESMER-group/mesmer/blob/13f048b1106faf302755a6181358243b43fffb5b/mesmer/calibrate_mesmer/train_utils.py#L50-L58

I refactored this in #143 and adapted the comment to

https://github.com/MESMER-group/mesmer/blob/456776d4a318e50bc7f642f097354c76c24a21fc/mesmer/calibrate_mesmer/train_utils.py#L14-L15

but importantly the code stayed the same:

https://github.com/MESMER-group/mesmer/blob/456776d4a318e50bc7f642f097354c76c24a21fc/mesmer/calibrate_mesmer/train_utils.py#L39

From Beusch et al. (2022):

"To obtain robust MESMER parameter estimates for each ESM, MESMER is trained on all available ensemble members of each available scenario and equal weight is given to each scenario."


I think it's not 100% clear - you could argue that the historical scenario does get a bit more weight as it has more time steps. But saying the weight is 1 / n_ens is a just-as-valid interpretation of "equal weight for each scenario". So in conclusion there is nothing to do here (except maybe to adapt my comment).

Originally commented in https://github.com/MESMER-group/mesmer/pull/567#pullrequestreview-2464567678

edit: corrected n_scen -> n_ens

veni-vidi-vici-dormivi commented 4 days ago

Shouldn't n_scen be n_ens or n_runs? Or do you actually mean n_scen because if you were to weigh each sample by 1/n_scen scenarios with more members would be overrepresented.

mathause commented 4 days ago

Shouldn't n_scen be n_ens or n_runs? Or do you actually mean n_scen because if you were to weigh each sample by 1/n_scen scenarios with more members would be overrepresented.

Yes you are right - I mean n_ens. I'll correct it above.