MESMER-group / mesmer

spatially-resolved ESM-specific multi-scenario initial-condition ensemble emulator
https://mesmer-emulator.readthedocs.io/en/latest/
GNU General Public License v3.0
23 stars 17 forks source link

scipy.optimize in MESMER-M #428

Closed veni-vidi-vici-dormivi closed 1 month ago

veni-vidi-vici-dormivi commented 5 months ago

In MESMER-M we use scipy.optimize in

Imo this leads to several issues. Below I want to point out what those issues are. This might appear a little all over the place but I think that the use of scipy.optimize is the root of all those issues, which is why I wanted to summarize them here.

harmonic model

In harmonic_model.py we have basically one call that the user has to make: harmonic_model.fit_to_bic_xr (should maybe think of that name again). Within this, we

  1. look for the best order of the Fourier series by fitting the coefficients for each order. We get the coefficients for each order by optimizing the least squared error between the predictions and the true values using scipy.optimize.least_squares.
  2. After we got the optimal coefficients for each order we calculate again the MSE for the chosen set of coefficients to calculate the BIC of each order and safe it, to subsequently use the order with the smallest BIC.
  3. after we found the order, we currently fit for the coefficients again and retrieve the predictions. The output of this is one xarray dataset that contains the selected order (dims: gridcell), the fit coefficients (dims: gridcell, coefficients) and the predictions (dims: time, gridcell).

So we do a lot of the work double in here. I think we could avoid a lot of this by restructuring the code as such:

  1. We fit for new orders until we reach a local minimum of BIC
  2. For each new order we output and save (replace) the coefficients and predictions so that when we hit the local minimum, we can take these values right away. This is similar to what we do for the localization radii for the localized covariance I think. My understanding is that when we want to do it this way, we cannot use the scipy optimize functionality but have to do it ourselves. Why? Because, as I understand it the scipy optimize functions do not let us save any output that the optimized function might create. Here a question arises: Is it possible that even though this seems to be more efficient, the fact that we write it ourselves could end up making the process slower as before?

power transformer

In the power transformer we currently

  1. find the optimal coefficients for estimating lambda using pt.fit(). For this we minimize the log likelihood of the transformed data representing the true data with the lambda that is calculated using the coefficients (using scipy.optimize.minimize).
  2. We output the coefficients and now the user has to call pt.transform(temperature_data) to get the transformed temperatures. Within this we calculate the lambda and the transformed data again, after having calculated it already in the fit part. Again, I think this is because the scipy optimization does not let us save the data from within the optimized function. During all this, we never actually output the lambdas. Moreover, the user currently has to do this for every month individually and put together the data manually at the end and everything is still done on numpy arrays.

I am currently in the process of rewriting this for xarrays and am thinking to restructure the whole thing as such: Have one function for the user as for the harmonic model. This also means going back to functions and away from the power transformer class. I think this is the better option because then it is similar to what the user has to do for the harmonic model. This function would then

This again, would remove much of redundancy in the code but makes for a less nice separation of fitting and transforming in the code. I guess it would also be an option to leave the power transformer class and have the slight redundancy of recalculating the transformed values and just rewriting the power transformer to run on xarrays with minimal changes to all the other code. Then, it would be nicest though to also rewrite the harmonic model to a class... On the other hand it might also be possible to leave everything as is and have two different approaches for the harmonic model and the power transformer.