Open mathause opened 10 months ago
And to complete this, an example:
import numpy as np
arr = np.array([1, 0, 1, 5, 2])
np.sort(arr)
# array([0, 1, 2, 5])
And potential bugfix
__, idx = np.unique(arr, return_index=True)
arr[np.sort(idx)]
# array([1, 0, 5, 2])
Nice find! I can fully imagine that it was a huge pain to work all this out.
I replicated a workflow with several ensemble members outside of "legacy" MESMER and the params of the "local forced response" did not line up
Are you building MESMER again elsewhere? We've started doing these sort of re-writes more and more in other stuff we maintain and have found them incredibly helpful. I'd be interested to hear what the plan is here (but no pressure, you have lots of things happening).
In
separate_hist_future
the historical ensembles are de-duplicated usingnp.unique
. This shuffles the ensemble members "randomly" becausenp.unique
sorts the output!TLDR: This messes up the order of the residual variability and underestimates its influence.
train_gv
, as the target (tas
) and one of the predictors (gv_novolc_tas_s
, the residual variability) is split. Now the target and residual variability have random order, leading to a wrong result.tas
,tas2
,hfds
) are averaged over the ensemble members (i.e. are all the same for all ensemble members) - so for them it does not matter.separate_hist_future
is also used intrain_gt
, but that does not matter, because the hist simulations are then averaged-over.The story is: I replicated a workflow with several ensemble members outside of "legacy" MESMER and the params of the "local forced response" did not line up. It took way too long comparing and trying to re-order the
targs
andpreds
of my and Lea's version. Only when I was able to rule out any error on my side, did it make click.This was a bitch to figure out - I know that
np.unique
sorts its output, but I never thought about its implications, when looking atseparate_hist_future
.cc @leabeusch (FYI) and @znicholls who might find this interesting.