better first guess for `fit_yeo_johnson_transform`

MESMER-group / mesmer

spatially-resolved ESM-specific multi-scenario initial-condition ensemble emulator

https://mesmer-emulator.readthedocs.io/en/latest/

GNU General Public License v3.0

23 stars 18 forks source link

better first guess for `fit_yeo_johnson_transform` #492

Open mathause opened 2 months ago

mathause commented 2 months ago

We can speed up the fit_yeo_johnson_transform by passing a better first guess, assuming the trend is 0. We can get the first guess using:

from sklearn.preprocessing import PowerTransformer

l = PowerTransformer().fit(tas_stacked_y.tas).lambdas_

# we can calculate xi_0 from lambda as
xi_0 = (2 - l) / l

veni-vidi-vici-dormivi commented 2 months ago

Hm but instead of tas_stacked_y.tas with would use resids_after_hm.tas[month] right? So the assumption would be that there is a skew of the monthly residuals w.r.t. to the yearly values but that it is constant and not dependent on the yearly temperature value. That's a good idea. But we would need to do it 12 times too. Does that pay off?

mathause commented 2 months ago

Hm but instead of tas_stacked_y.tas with would use resids_after_hm.tas[month] right?

Yes

Does that pay off?

The idea is that there is not much trend and that it's much faster to fit one param than 2 and that starting at a good point for $\xi_0$ speeds up the minimization. It helps, but only by about 10% - so much less than I would have hoped.

mathause commented 2 months ago

I could try again with much lower precision for the first guess - most of the iterations are spent honing in the estimate. The fit uses sp.optimize.brent with a tolerance of about 1e-8. For our purpose 1e-2 is probably enough.

Only problem: the tol param is not exposed in PowerTransformer().fit.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.brent.html

Just for clarity: this yields a maximum of another 10% speed gain - so still debatable if its worth the trouble.