downsample may not be fully optimized

astrofrog commented 9 years ago

In APLpy I have a pure-python (well with Numpy) implementation of downsample called resample (which I wrote a while ago) and it turns out it can actually be faster than downsample here which is surprising:

In [1]: from imageutils import downsample

In [4]: from aplpy.image_util import resample

In [6]: image = np.random.random((4096,4096))

In [7]: %timeit resample(image, 10)
10 loops, best of 3: 73.6 ms per loop

In [8]: %timeit downsample(image, 10)
10 loops, best of 3: 85.2 ms per loop

I wonder if the order of the looping has something to do with it, or maybe the fact that we're still accessing a Numpy array from Cython.

astrofrog commented 9 years ago

Just to be clear, I believe we should be able to do better than resample but I was just surprised that a naive Numpy implementation was faster.

astrofrog commented 9 years ago

I looked into this a bit more and for small arrays the current implementation is the most efficient. For large arrays (>10000x10000) it looks like the following implementation is more efficient:

def downsample(image, factor):
    nx_new = image.shape[1] // factor
    ny_new = image.shape[0] // factor
    nx_init = nx_new * factor
    ny_init = ny_new * factor
    return (image[:ny_init, :nx_init]
            .reshape((ny_new, factor, nx_new, factor))
            .mean(axis=3).mean(axis=1))

The timings are:

   image_size    factor     current      pure-numpy       speedup
        10          2       0.004103       0.046804       0.087662
        10          3       0.004006       0.047328       0.084641
        10          5       0.004108       0.045531       0.090229
        10         10       0.003982       0.044102       0.090292
        10         20       0.004007       0.042776       0.093671
       100          2       0.023680       0.180519       0.131176
       100          3       0.020192       0.177550       0.113724
       100          5       0.018659       0.110810       0.168384
       100         10       0.016670       0.082870       0.201162
       100         20       0.016530       0.069902       0.236468
      1000          2       7.999492      11.448002       0.698768
      1000          3       3.766799       8.132291       0.463190
      1000          5       3.664613       5.139089       0.713086
      1000         10       2.858496       2.900219       0.985614
      1000         20       2.293301       1.844788       1.243124
     10000          2    2005.700111    1588.447094       1.262680
     10000          3     900.852919    1007.672071       0.893994
     10000          5     669.445992     625.443220       1.070355
     10000         10     476.181030     328.037024       1.451608
     10000         20     360.677004     220.120907       1.638540
     15000          2    4883.916855    3471.009970       1.407059
     15000          3    2229.454994    2352.994919       0.947497
     15000          5    1655.629158    1420.667887       1.165388
     15000         10    1157.596111     797.841072       1.450911
     15000         20     840.254068     500.336170       1.679379
     20000          2   11201.428175   12717.656136       0.880778
     20000          3    4786.141872    4208.330870       1.137302
     20000          5    3960.713148    2495.106936       1.587392
     20000         10    2380.532026    1388.871193       1.714005
     20000         20    1717.440844     853.933811       2.011211

Basically the speedup is up to a factor of 2, but only for very large images. We could simply add the pure-python implementation as an alternative, or we could simply leave it out. I don't think an automated switchover between the two is necessarily sensible because that limit might depend on the machine used.

cdeil commented 9 years ago

The number of users that care about downsample speed is probably very small or zero (because basically any other image processing task is more complex and slower) ... so I wouldn't put effort into this. But of course if anyone needs or wants to implement something that is fast for large arrays, pull requests are very welcome.

astropy / imageutils

downsample may not be fully optimized #28