find_index and numpy.searchsorted

giumas commented 8 years ago

The function find_index (https://github.com/AllenDowney/ThinkDSP/blob/master/code/thinkdsp.py#L143) does the same as numpy.searchsorted (http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.searchsorted.html) but in a less efficient way.

I cannot find a reason to maintain find_index. Do you?

AllenDowney commented 8 years ago

find_index is constant time; searchsorted is logarithmic. Big enough difference to be worth maintaining? Maybe not.

On Fri, Feb 5, 2016 at 10:01 AM, giumas notifications@github.com wrote:

The function find_index ( https://github.com/AllenDowney/ThinkDSP/blob/master/code/thinkdsp.py#L143) does the same as numpy.searchsorted ( http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.searchsorted.html) but in a less efficient way.

I cannot find a reason to maintain find_index. Do you?

— Reply to this email directly or view it on GitHub https://github.com/AllenDowney/ThinkDSP/issues/18.

giumas commented 8 years ago

Did you try to time it?

In [1]: 
import numpy as np
def find_index(x, xs):
    """Find the index corresponding to a given value in an array."""
    n = len(xs)
    start = xs[0]
    end = xs[-1]
    i = round((n-1) * (x - start) / (end - start))
    return int(i)
test_array = np.array(range(99999))/1.0
print(test_array)

[  0.00000000e+00   1.00000000e+00   2.00000000e+00 ...,   9.99960000e+04
   9.99970000e+04   9.99980000e+04]

In [2]: 
%timeit find_index(50000, test_array)

The slowest run took 8.06 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.16 µs per loop

In [3]: 
%timeit test_array.searchsorted(50000)

The slowest run took 25.84 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 502 ns per loop

In [4]: 
%timeit find_index(0, test_array)

The slowest run took 9.03 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.14 µs per loop

In [5]: 
%timeit test_array.searchsorted(0)

The slowest run took 19.14 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 505 ns per loop

In [6]: 
%timeit find_index(1000000, test_array)

The slowest run took 9.18 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.18 µs per loop

In [7]: 
%timeit test_array.searchsorted(1000000)

The slowest run took 16.96 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 516 ns per loop

There could be some optimization in numpy.searchsorted and less overhead.. just a guess.

AllenDowney / ThinkDSP

find_index and numpy.searchsorted #18