Feat: Unimproved score calculation count termination

storm-TOS commented 1 month ago

Is there a reason why this has not been implemented yet? As far as I am aware, the following terminations are available:

	Total	Unimproved	Unimproved with score difference threshold
Time	Yes	Yes	Yes
Step count	Yes	Yes	No
Score calculation count	Yes	No	No

For optimization to be as fast as possible without sacrificing final solution quality, our product relies crucially on unimproved time spent terminations. We use a sequence of short solver phases with very different configurations. If one of the earlier phases terminates too early, the final solution quality degrades severely.

Unfortunately, time based limits cause the solving process to be very dependent on hardware, resource availability, use of multi-threading and node sharing, JVM tweaks, etc. Unimproved score calculation count termination would offer much more reliability.

Since this feature would be so useful to us, I could make a case for implementing it myself. Let me know if you are interested. (I will only be able to respond after next week, though.)

triceo commented 1 month ago

Hello @storm-TOS, and thanks for reporting.

For the use case you described, wouldn't it make more sense to terminate on unimproved step count? That is not only independent of hardware, but also independent of all sorts of solver configuration tweaks.

Depending on something as specific as unimproved score calculation count is not something I can recommend.

storm-TOS commented 1 month ago

Hi @triceo, thanks for sharing your insights.

I did consider step count limits, but I think it would not work well in our case. I just solved the same problem twice, and these were the score calculation speeds (SCS) and step counts for each of the six phases:

# First run
(84823/sec), step total (790).
(155846/sec), step total (322).
(85206/sec), step total (3268).
(163785/sec), step total (1030).
(177257/sec), step total (129).
(161201/sec), step total (291).

# Second run
(78865/sec), step total (796).
(156458/sec), step total (865).
(73480/sec), step total (3574).
(169901/sec), step total (246).
(181659/sec), step total (169).
(166378/sec), step total (73).

As you can see, the step counts vary greatly between corresponding phases (e.g. 1030 vs 246 in the fourth phase), whereas the SCS is quite stable. (The phases are hill climbing, tabu, and alternately great deluge and simulated annealing; the final two of the six phases are admittedly mostly ornamental.)

Of course, the SCS does vary with the problem input, but in our case not shockingly.

Something else that may play a role is that we're doing a form of real time planning, in which the planning facts only undergo small changes, but progressively more entities get pinned. This means that score calculation is relatively stable across runs during the day, while both the doable moves and the improving moves decrease drastically in number (because of pinning and, presumably, because of solution convergence/diminished returns). We would like to use the same configuration for each run (except the very first), but terminating on step limits might get the solver stuck on an early phase for too long.

I would also argue that using score calculation count limits instead of time limits is more reliable when adding/modifying constraints; this might lower the SCS, but you'd want to try out the same number of moves (if not more), assuming solution quality is the primary concern.

Finally, it is easy enough to tweak any limits if we change the solver configuration at any point, but it is much more inconvenient to scale the time limits whenever something about the environment changes.

triceo commented 1 month ago

I would also argue that using score calculation count limits instead of time limits is more reliable when adding/modifying constraints; this might lower the SCS, but you'd want to try out the same number of moves (if not more), assuming solution quality is the primary concern.

This, I think, is the crux of the issue - that the number of score calculations equals the number of moves. We do not want our users to ever rely on that, because some of the future cool features are likely to break this assumption; such as ruin & recreate moves, or guided local search.

There may come a time when a move will result in more than one score calculation. In fact, it could even be true today, because we've never guaranteed the 1:1 ratio. As far as we're concerned, score calculation speed should not be understood as a measure of solver progress, only as a measure of solver performance.

storm-TOS commented 1 month ago

Interesting to hear about those upcoming features! And I totally understand you don't want to mislead users into relying too much on score calculation measures.

I will just leave these final considerations:

Score calculation count is no less of a measure of solver progress than time spent. It may not be possible to make promises about the number of score calculations per move, but this is definitely impossible for time spent per move. Which metrics are most important, and which types of termination serve them best, will vary per user. I can see how most might prioritize predictable/stable solving times, but this is not the case for us.
Total score calculation count termination is already there anyway.
One could consider supporting move count termination instead.

Feel free to close this issue if your conclusion is that this feature would be more harmful than beneficial. I could always fork!

In any case, thank you for your time.

triceo commented 1 month ago

@storm-TOS You bring interesting points. We'll think about it a bit more, maybe there is a better way of getting you what you need.

storm-TOS commented 1 month ago

Sounds good!

triceo commented 1 month ago

Brief notes from a discussion with some of my colleagues:

The split between move count and score count is real; it will happen in the relatively near future.
Unimproved score calculation count termination is unlikely to happen.
We'll consider doing terminations on move count, although it's not on top of list of priorities. Pull requests are welcome!

storm-TOS commented 1 month ago

Thanks for the update! I'll see if I can spare the time.

TimefoldAI / timefold-solver

Feat: Unimproved score calculation count termination #995