If kwarg algorithm is passed to fit() and stan fails to converge we should strip algorithm from kwarg before invoking newtwon fallback

facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

https://facebook.github.io/prophet

MIT License

18.42k stars 4.53k forks source link

If kwarg algorithm is passed to fit() and stan fails to converge we should strip algorithm from kwarg before invoking newtwon fallback #700

Closed whitehatty closed 5 years ago

whitehatty commented 6 years ago

In forecaster.py, lines 1046-1053:

            try:
                params = model.optimizing(
                    dat, init=stan_init, iter=1e4, **kwargs)
            except RuntimeError:
                params = model.optimizing(
                    dat, init=stan_init, iter=1e4, algorithm='Newton',
                    **kwargs
                )

if kwarg algorithm is passed to fit and a RuntimeError is triggered the call to optimizing will fail due to duplicated kwarg algorithm. algorithm should be removed from kwargs before passing it to the call to optimizing.

bletham commented 6 years ago

Agreed. Anyone want to make a PR ? :1st_place_medal:

whitehatty commented 6 years ago

I can after work

On Oct 16, 2018, at 3:30 PM, Ben Letham notifications@github.com wrote:

Agreed. Anyone want to make a PR ? 🥇

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

bletham commented 5 years ago

Pushed to PyPI.

whitehatty commented 5 years ago

@bletham Can we reopen this? I guess some other PR removed the fix, in v0.4 and master, lines 1096--1104, in forecaster.py:

            try:
                params = model.optimizing(**args)
            except RuntimeError:
                if 'algorithm' not in args:
                    # Fall back on Newton
                    args['algorithm'] = 'Newton'
                    params = model.optimizing(**args)
                else:
                    raise

bletham commented 5 years ago

@whitehatty the behavior is a bit different, but it's now intentional and not a bug.

The bug was that previously if RuntimeError, we would repeat the optimization with the Newton optimizer like

model.optimizing(algorithm='Newton', **args)

which would throw a SyntaxError error if we had previously specified the algorithm in args. This was a bug because we were passing in a repeated keyword argument.

Now, if algorithm is specified, then it will be used in the first call to model.optimizing. If that errors out with a RuntimeError, then we do not replace the algorithm, rather we let it raise. The thought here is that if the user goes to the effort of specifically setting algorithm='L-BFGS', then I don't think we should override that. If the user wants to have this fallback behavior they can not specify the algorithm, or they could handle the RuntimeError in some different way if they wanted.

Does that seem reasonable, or is there a use case that I'm missing?

whitehatty commented 5 years ago

My thought is that we should raise only if we can't fit the model, not if a specific algorithm fails, and this branch of code is invoked as a backup case anyway.

I am not sure why one would not want the model fitted by the Newton method as a fitted model and consider raising an exception better. The only reason being

if we want to compare the different opt algorithms (but in this case we can do it directly from pystan)
if Newton method is 10x slower (or some x slower)
- if the model fitted using the Newton method behaves very differently, i.e. leads to significantly different parameter estimates than the other methods .

Can you provide any insight on the last two possible issues?

In my specific case for example I am explicitly setting BFGS because for some unknown reasons it converge faster than LBFGS (also appreciate if you have an explanation for that).

bletham commented 5 years ago

I see, so you're specifying BFGS, in which case it is reasonable to want to fall back to Newton. I was thinking that maybe you were specifying L-BFGS.

That makes sense, I opened a new issue in #877 to do this. (Since it is a slightly different issue than the one here, which was a bug).

As for BFGS requiring fewer iterations, that makes sense to me since L-BFGS uses an approximate Hessian instead of the full one that BFGS does in order to reduce memory usage. I haven't personally tested the fitting with BFGS to see what the memory/speed tradeoff is (L-BFGS is Stan default), but that seems worth doing.