Normalize ETA according to a complexity dependence on a variable

Rockhopper-Technologies / enlighten

Enlighten Progress Bar for Python Console Apps

https://python-enlighten.readthedocs.io

Mozilla Public License 2.0

416 stars 25 forks source link

Normalize ETA according to a complexity dependence on a variable #65

Open doronbehar opened 5 months ago

doronbehar commented 5 months ago

Is your feature request related to a problem? Please describe.

I'm using enlighten in a Python script that iterates a variable N that is passed to an external program. Complexity theorem predicts that the time the external program runs is linear to N.

If I understand correctly the way the ETA is calculated at the moment, it just averages the time it took for the previous iterations, and multiplies it by amount of iterations left.

Describe the solution you'd like

I'm looking for a way to tell enlighten in advance - before the main loop starts, the N of each iteration, so that it can take that into account when it computes the ETA

Describe alternatives you've considered

None.

Additional context

The external program I run is called LAMMPS, which simulates classical particles interacting with one another. It uses the GPU, and hence the complexity should be linear to the total number of particles in the simulations.

Implementation idea

I was thinking that this new feature could be simply a new argument to enlighten.Counter, that should have a len() equal to total given to enlighten.Counter.

avylove commented 5 months ago

Thanks for reporting!

I think what this would look like is arguments to enlighten.Counter that specify optional functions for calculating the rate and eta. I can definitely see value in this. At one point I looked at making these all properties to allow easier customization, but it slowed calculations down more than I was comfortable with. Making these fields shouldn't affect the default because it's a simple test to see if they are not None.

You can do this today in a less direct manner by using a custom format and user-defined fields with something like the example below. Note how the values remain inear, but are calculated so you can account for externalities like system and network load that may affect the real time for each iteration.

import time
import enlighten
from enlighten._util import format_time

def func(N):
    """Simulate work with linear complexity"""
    time.sleep(0.01 * N)

max_N = 100
manager = enlighten.get_manager()
bar_format = '{desc}{desc_pad}{percentage:3.0f}%|{bar}| {count:{len_total}d}/{total:d} ' + \
             '[{elapsed}<{n_eta}, {n_rate:.2f} x N{unit_pad}s/{unit}]'
pbar = manager.counter(total=max_N, desc='Linear Complexity', unit='iteration', bar_format=bar_format)

for N in range(1, max_N + 1):
    func(N)

    elapsed = pbar.elapsed
    n_rate = elapsed / (N * (N + 1) / 2)
    n_eta = format_time((max_N - N) / 2 * (N + 1 + max_N) * n_rate)

    pbar.update(n_rate=n_rate, n_eta=n_eta)

doronbehar commented 5 months ago

Thanks for the example! It looks pretty straight forward. I am a bit confused though how did you calculate those N and total formulas :thinking: .

If I understand correctly, at least for the linear case, it should be rather easy to add this kind of functionality to the counter, such that when updating, I can simply add a etaLinearityFactor parameter (of course the naming is open for discussion).

I think that even for a non linear cases, if such an etaLinearityFactor parameter will be available as an argument to pbar.update, the user will be able to calculate by themselves the etaLinearityFactor that should be passed, by simply using the same complexity function.

If such an etaLinearityFactor argument was available, here's how your example would have looked like, for a func with an N Log(N) complexity:

import time
import enlighten
import numpy as np
from enlighten._util import format_time

def func(N):
    """Simulate work with non linear complexity"""
    time.sleep(0.01 * N * np.log(N))

total = 100
manager = enlighten.get_manager()
pbar = manager.counter(total=total, desc='N Log(N) Complexity', unit='iterations')

for N in range(1, total + 1):
    func(N)
    pbar.update(etaLinearityFactor=N*np.log(N))

avylove commented 5 months ago

Thanks for the example! It looks pretty straight forward. I am a bit confused though how did you calculate those N and total formulas 🤔 .

Looking at it again, I made some mistakes in the example, so I fixed those, so hopefully it's clearer now. It does assume the values of N are consecutive and start at 1, so if that is not your case, you'd need to adjust the math.

n_rate = elapsed / (N * (N + 1) / 2)

I'm using N * (N + 1) / 2 to calculate 1 + 2 + 3 + ... + N This is basically calculating the average time it should take when N = 1. If that is x, the calculated time for an iteration is x * N seconds.

n_eta = format_time((max_N - N) / 2 * (N + 1 + max_N) * n_rate) If we let M = N + 1, this uses (M / 2)(M + N_max) to get the sum of all the remaining values for N. Then it multiplies that by the calculated time for an iteration where N = 1 to determine the remaining time.

pbar.update(etaLinearityFactor=N*np.log(N))

Your example is confusing because you didn't define a custom format that uses etaLinearityFactor. If you are trying to show what an example would look like if the requested feature was added, it would look like this:

pbar = manager.counter(total=total, desc='N Log(N) Complexity', unit='iterations', rate=custom_rate_function, eta=custom_eta_function)

doronbehar commented 5 months ago

Your example is confusing because you didn't define a custom format that uses etaLinearityFactor

That was intentional, as I was envisioning this new parameter to not be part of the bar_format, but rather taken into special consideration in enlighten.Counter, when the ETA and rate are calculated.

If you are trying to show what an example would look like if the requested feature was added, it would look like this:

Your example confused me :) as it's not clear what are the signatures of the custom_{rate,eta}_function...

avylove commented 5 months ago

That was intentional, as I was envisioning this new parameter to not be part of the bar_format, but rather taken into special consideration in enlighten.Counter, when the ETA and rate are calculated.

bar_format is how you can do this today. The code in that example can be run as is an you should be able to adapt it to your use case. Custom rate and eta functions do seem useful, but they will be generic, not tied to linear behavior, and the bar_format example should provide you a workaround until this feature is implemented.

Your example confused me :) as it's not clear what are the signatures of the custom_{rate,eta}_function... That hasn't been determined yet, it's part of the work that would have to be done to implement this. It also would need error handling, documentation, and tests.

doronbehar commented 5 months ago

bar_format is how you can do this today. The code in that example can be run as is an you should be able to adapt it to your use case.

I understand :+1: Will update how it works for me :) (It's a bit non trivial in my case, because N is in a np.logspace and not a np.linspace).

Custom rate and eta functions do seem useful, but they will be generic, not tied to linear behavior,

I understand your thinking, and that's also good. What I tried to claim however, is that even if you bound this new functionality to linear complexity only, the user will be able to supply a floating point number, that will be calculated by them according to the complexity theorem they hold.

avylove commented 5 months ago

I understand 👍 Will update how it works for me :) (It's a bit non trivial in my case, because N is in a np.logspace and not a np.linspace).

It should still be trivial, just a little slower. the only thing that changes is how you calculate the sum of N values already processed and the sum of remaining N values, otherwise, everything else can be the same.

n_rate = elapsed / SUM_OF_N_VALUES_PROCESSED
n_eta = format_time(SUM_OF_REMAINING N_VALUES * n_rate)

I understand your thinking, and that's also good. What I tried to claim however, is that even if you bound this new functionality to linear complexity only, the user will be able to supply a floating point number, that will be calculated by them according to the complexity theorem they hold.

That creates multiple problems. It makes enlighten more complex. Are we going to add another argument for every type of complexity algorithm? The choice of a linear complexity is an arbitrary choice, Perhaps the complexity is logarithmic. The point is the default algorithms work >99% of the time, but sometimes people need a way to customize, so it makes sense to provide a path to do that. The other issue is you're thinking you should use the expected processing time for the progress bar, but that's not what you want in a progress bar,. You want the real time, which can vary substantially from the expected time.