suggest NonlinearLeastSquares not to enable 'while it_<num_iter'

linjing-lab commented 1 year ago

🚀 Feature

I'm familiar with gauss_newton and levenberg_marquardt which I implemented them in optimtool released on PyPI, and I suggest not to enable maximum number of iterations because the _check_convergence value makes the most important role in stopping iteration while num_iter has nothing to do with convergence from my experience of designing methods.

Motivation

I think gaussnewton method could not prove an exact minimum iterative times under the premise of verifying algorithm convergence, even the expectation of stating its convergence times through fixed default parameters. In my motivation, I hope nonlinear method not to enable [*while it< num_iter*]((https://github.com/facebookresearch/theseus/blob/9a117fd02867c5007c6686e342630f110e488c65/theseus/optimizer/nonlinear/nonlinear_least_squares.py#L114)) to make them become a definitely convergent black-box used as a core tool in evaluating models of PyTorch.

Pitch

I hope this randomly convergent algorithms (decided by differences in input data) can measure the error of the models without being limited by the maximum number of iterations.

Alternatives

I might suppose to set num_iter with a huger number which may definitely larger than the exact minimum iterations, make sure algorithms are converged under any function with optimal point.

Additional context

L114 of NonlinearLeastSquares class from theseus.optimizer.nonlinear_least_squares which are inherited by GaussNewton and LevenbergMarquardt

mhmukadam commented 1 year ago

Thanks for the suggestion! It makes sense but is application dependent, and impacts computation time and backward mode differentiation. Disabling while it_ < num_iter would not be possible but like you suggest on the user side you can always set a high value for max iterations to achieve this behavior.

linjing-lab commented 1 year ago

I don't even try to set a high value for max iterations because I have set many hyperparameters for users to adjust, not typically like you make sense on impact computation time and backward mode differentiation, though a strong combination of parameters like higher learning rate and bigger penalty factor will accelerate the convergence of a optimization program.

luisenp commented 1 year ago

Thanks for your suggestion, @linjing-lab. Note that convergence to optimum is not the only consideration when trying to differentiate through an optimizer. Unrolling iterations is a very common form of backward mode differentiation, and in this case setting the maximum number of iterations is a common choice, in part because the computation graph of all iterations must be kept in memory.

mhmukadam commented 1 year ago

Closing for now since the suggestion is applicable more as a user setting than a feature. Please reopen if you need to follow up.

facebookresearch / theseus