Improve remaining time estimation

mstimberg commented 4 years ago

Taking this out of #1162 since it is not only about standalone mode, quoting from there:

We might want to implement some simple(!) approach to improve the estimates of the remaining time (this would also apply to runtime mode). See e.g. https://stackoverflow.com/questions/933242/smart-progress-bar-eta-computation for some pointers.

mstimberg commented 4 years ago

@rahuliitg's comment in #1162:

Hi @mstimberg I'm thinking of using linear regression to approximate the remaining time (http://code.activestate.com/recipes/578914-simple-linear-regression-with-pure-python/). I've seen other techniques like this one - https://en.wikipedia.org/wiki/Pearson_correlation_coefficient as well

mstimberg commented 4 years ago

Hi @rahuliitg. I'm not quite sure what linear regression would look like in this case. My intuition is that it's rather more complicated than what we need. To make clearer what the problem with the current situation is (which might still be the best solution all things considered!), I wrote some code (in this gist). Here I am considering 4 scenarios:

"constant": the simulation advances at constant speed
"varying": the simulation speed varies up and down continuously
"slow start": the simulation speed is slow for the first 50% of the total time and then speeds up for the remaining 50%
"slow end": the inverse, the simulation is first fast then slow.

The line shows the accurate prediction of the remaining time (which is easy to calculate after a simulation has been run) and the dots the prediction made by the estimate function which implements our current algorithm (predict based on the total elapsed time and simulation progress, e.g. if 40% of our simulation took 4s, we predict that we still have 6s to go).

Figure_1

Scenario 1 is trivial and the prediction is perfect. Scenario 2 is well handled, the estimates are slightly off but not by much. Scenarios 3 and 4 are those that could be improved. Note that it is impossible to improve the estimates in the first half of the simulation, because we cannot look into the future and predict that the simulation speed will change. However, after the abprubt change in speed at 50%, the estimate over-/underestimates the remaining speed until the end. A smarter algorithm might detect that the current speed is much faster/slower than the earlier speed and more strongly adapt the estimate. Note that to "solve" scenarios 3 and 4, you could go completely to the other extreme and simply only used the recent speed to make your estimate. However, this would give bad results for scenario 2. This is what this would look like: Figure_2

I guess the best solution would be somewhere in between the two. Of course there are more scenarios that could be interesting to look at, e.g. a mostly constant speed with an intermediate period where the speed is faster/slower.

Hope that makes things clearer a bit. By plugging in a different function for estimate in my gist code you can try out other approaches.

rahuliitg commented 4 years ago

@mstimberg I tried with the

Exponential moving average method,with the smoothing factor(alpha) = 0.1.
running moving average method

only a slight improvement in results, for initial data points

after detecting slow start or slow end I adjusted the smoothing factor accordingly and got good results for scenario 2 also

zeph1yr commented 4 years ago

@mstimberg How suitable would it be for a package like brian to forecast the remaining time rather than estimating it. This can be done by Monte Carlo simulation and the results may somewhat look like this,
stimulation may complete in 35m with 90% probability

mstimberg commented 4 years ago

Hi @zeph1yr . Something like this could indeed be interesting. I think the most interesting would be to give a range (e.g. the 95% interval) instead of a probability for a maximum time. So something like:

Estimated time remaining: 30m - 35m

By default, we are printing the progress every 10s and this is the only time when we have access to the data points. This will not work well for a Monte Carlo/bootstrapping approach (not quite sure what is the correct term here) except for very long simulations. The best solution would probably to separate the data sampling and the report, e.g. in addition to the report_period argument we could have a sample_period argument. Then we could sample the data every, say, 100ms, and then calculate and print an estimate every 10s.

Syed-Osama-Hussain commented 4 years ago

Hello @mstimberg . How can I extract the actual value of the Quantity object (without the unit attached)?

mstimberg commented 4 years ago

Hello @mstimberg . How can I extract the actual value of the Quantity object (without the unit attached)?

See https://brian2.readthedocs.io/en/stable/user/units.html#removing-units If q is an array of values, the easiest is to use np.array(q). If it is a scalar value, you can use float(q).

Syed-Osama-Hussain commented 4 years ago

Okay. Thanks. I've used exponential smoothing to estimate the remaining time. I get the following results: Time Estimation

zeph1yr commented 4 years ago

Hi @mstimberg, Here's the implementation of how the bootstrap algorithm to forecast remaining time would look like -

https://gist.github.com/zeph1yr/90bf1dd5f5221290dfc1b3e18d8cbe1e

The upper bound for confidence interval is 95% and an estimated range is given as output. Waiting for your feedback.

P.S. - Here's how the random scenario was created - https://gist.github.com/zeph1yr/12358db51a1f8e945099abdc6c8cbc6c

brian-team / brian2

Improve remaining time estimation #1173