harmonic model: only fit orders up to local minimum and use coeffs from previous order as first guess

veni-vidi-vici-dormivi commented 1 month ago

[x] Closes #422
[ ] Fully documented, including CHANGELOG.rst

As already mentioned in the issue above, we can use the fit from the previous order as the first guess for the next order. Moreover, we can only fit orders until we reach a local minimum instead of fitting for all orders and then selecting the one with the smallest loss. This brings down the time needed to fit the harmonic model from 40.8 seconds to 8.9 seconds for a 250 years x 118 gridpoint dataset.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.77%. Comparing base (9b0b76b) to head (7f991da). Report is 47 commits behind head on main.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #443 +/- ## ========================================== - Coverage 87.90% 83.77% -4.14% ========================================== Files 40 44 +4 Lines 1745 1954 +209 ========================================== + Hits 1534 1637 +103 - Misses 211 317 +106 ``` | [Flag](https://app.codecov.io/gh/MESMER-group/mesmer/pull/443/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=MESMER-group) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/MESMER-group/mesmer/pull/443/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=MESMER-group) | `83.77% <100.00%> (-4.14%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=MESMER-group#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

veni-vidi-vici-dormivi commented 1 month ago

We could also write a function for finding the local minimum similar to minimize_local_discrete. I wondered if we could use this one here but I would like to preserve all the function output of fit_fourier_series_np and minimize_local_discrete is not made for that, and we also don't need it in its original use case.

What do you think @mathause, leave it like this, factor it out to a separate minimize function, rewrite the original minimize_local_discrete function or use the origin minimize_local_discrete function and get the coeffs and predictions in another call?

mathause commented 1 month ago

We could also write a function for finding the local minimum similar to minimize_local_discrete. I wondered if we could use this one here but I would like to preserve all the function output of fit_fourier_series_np and minimize_local_discrete is not made for that, and we also don't need it in its original use case. What do you think @mathause, leave it like this, factor it out to a separate minimize function, rewrite the original minimize_local_discrete function or use the origin minimize_local_discrete function and get the coeffs and predictions in another call?

Careful - you can only abort the loop early if it is strictly monotonic - can you confirm this?
Yes, would be nice if we could use minimize_local_discrete but I don't see an easy way, so lets stay here.

mathause commented 1 month ago

(I should maybe not have reviewed this PR, but thought it would be an easy win...)

veni-vidi-vici-dormivi commented 1 month ago

the way it is currently set up (returning the loss and not the residuals), loss="cauchy" has no effect - can you confirm? (also cauchy does not seem to be recommended)

Actually, no, it does have an influence... could it be that this is actually wrong? The documentation says func should return the residuals. Could it be that in this way the loss is interpreted as a residual?

mathause commented 1 month ago

I agree and think returning the residuals is the correct thing..., however, this sentence in the docstring had me assume returning the loss is fine as well. https://github.com/scipy/scipy/blob/7dcd8c59933524986923cde8e9126f5fc2e6b30b/scipy/optimize/_lsq/least_squares.py#L265

It must allocate and return a 1-D array_like of shape (m,) or a scalar.

I assumed loss="cauchy" would have no influence as np.log1p(z) should be monotonic for any scalar z (is that the right word?). I.e. $x > y \implies \ln{}(x) > \ln{}(y)$. However, the first and second derivative also flows into that (rho[1] and rho[2]) - maybe that has an influence. Or is it the $+1$ part of np.log1p(z)?

https://github.com/scipy/scipy/blob/7dcd8c59933524986923cde8e9126f5fc2e6b30b/scipy/optimize/_lsq/least_squares.py#L190

veni-vidi-vici-dormivi commented 1 month ago

Careful - you can only abort the loop early if it is strictly monotonic - can you confirm this?

Empirically - yes, mathematically - 😬 No?.

But logically, since we assume that underlying our data is a "pretty" function, i.e. an oscillating curve with a period of 12, I think it is fair to assume that each additional order brings us closer to the actual curve, and that the residuals decrease monotonically.

veni-vidi-vici-dormivi commented 1 month ago

I assumed loss="cauchy" would have no influence as np.log1p(z) should be monotonic for any scalar z (is that the right word?). I.e. . However, the first and second derivative also flows into that (rho[1] and rho[2]) - maybe that has an influence. Or is it the part of np.log1p(z)?

Hm the first derivative is not monotonic so maybe. I am not exactly sure why it comes to different results. I made a few observations:

returning the loss and using cauchy or not leads to different coefficients at an order of $10^{-5}$ and different predictions at $10^{-4}$ and the returned MSE is the same for both cases (at least up to $10^{-8}$).
returning the residuals without using cauchy (using just the default loss function so linear), the coefficients and predictions are similar to the ones when returning the loss ($10^{-4}$ and $10^{-3}$ ) also with the same MSE as the other two (1.012).
returning the residuals and using cauchy yields quite different coefficients (differing to the others at $10^{-3}$ and predictions at $10^{-1}$ and a greater MSE than all the others (1.016).
returning the residuals and using Huber instead of cauchy gets us an MSE of 1.014.

What it does in the background when we return residuals is apply the loss function to the residuals before computing MSE, and cauchy dampens outliers. I'm not sure though what it does when func returns a scalar. Thus I would say returning the residuals and using linear loss func is the best option.

However I found it nice to have the MSE of the chosen parameters returned by the optimize result, which we lose when we do it like that. OptimizeResult contains the final value of the cost function which is however MSE(lossfunc(resids)) and not MSE(resids). Computationally though it's not so bad I guess calculating the MSE twice. However, in that case I would find it prettier to pass the residuals to calculate_bicand calculate MSE inside of it instead of calculating the MSE in fit_to_bic between fitting the Fourier series and calculating the BIC.

Ok let's add the two TODO comments and merge.

I'll add the TODOs (including some from the stuff above) but I want to wait with merging after we merged the tests.

MESMER-group / mesmer

harmonic model: only fit orders up to local minimum and use coeffs from previous order as first guess #443

Codecov Report