Add 'bottleneck' to the analysis env

danielballan commented 8 years ago

Requested by @vivekthampy

P.S. Now you know where the 'wishlist' repo is, Vivek. :- )

danielballan commented 8 years ago

attn @ericdill

ericdill commented 8 years ago

What is "bottleneck"?

danielballan commented 8 years ago

https://pypi.python.org/pypi/Bottleneck

I think conda distributes it.

Sent from Outlook Mobilehttps://aka.ms/qtex0l

On Wed, Feb 24, 2016 at 2:48 PM -0800, "Eric Dill" notifications@github.com<mailto:notifications@github.com> wrote:

What is "bottleneck"?

— Reply to this email directly or view it on GitHubhttps://github.com/NSLS-II/wishlist/issues/87#issuecomment-188494642.

stuwilkins commented 8 years ago

Really sure you want to use it .....

https://nbviewer.jupyter.org/gist/stuwilkins/beb8eefcea332a0f24f4

stuwilkins commented 8 years ago

@vivekthampy most of the routines you will need are now in csxtools.

danielballan commented 8 years ago

Your benchmarks show bottleneck ~20X slower than numpy. I wonder if something went wrong there. I ran bottleneck's own benchmarks bottleneck.bench() on srv1 just now. Results:

# Python 3.5
In [1]: import bottleneck as bn

In [2]: bn.bench()
Bottleneck performance benchmark
    Bottleneck  1.0.0
    Numpy (np)  1.10.2
    Speed is NumPy time divided by Bottleneck time
    NaN means approx one-third NaNs; float64 and axis=-1 are used

                 no NaN     no NaN      NaN        NaN
                   (10,)   (1000,1000)   (10,)   (1000,1000)
    nansum         37.6        5.3       38.4        9.1
    nanmean       158.8        6.6      162.3       10.7
    nanstd        284.4        4.0      272.7        7.9
    nanvar        265.9        4.0      266.5        8.1
    nanmin         33.4        1.0       33.2        1.4
    nanmax         32.9        1.0       35.8        2.1
    median         56.4        0.9       75.3        1.1
    nanmedian      63.4        4.2       71.6        9.3
    ss             15.4        2.7       15.9        2.7
    nanargmin      65.8        6.6       66.4       11.9
    nanargmax      67.2        6.6       65.6       13.0
    anynan         13.6        0.7       14.1       73.6
    allnan         14.3       72.9       14.3       67.5
    (etc.)

Obvious observations: They are testing 1e6 numbers; you are testing 1e9. The shapes are also different. Could that really explain such a large discrepancy?

stuwilkins commented 8 years ago

@danielballan Its a factor of 2 slower not 20 .... 4.8 seconds vs 11 seconds.

I am no expert, not sure. However it is hard to thing how to screw that up in the test. I find it interesting that the speed increase drops dramatically from nanmeen from (10,) to (1000,1000). Given there is a big difference between there being NaNs and not NaNs I wonder how they implement it, I would instinctively thing that having NaNs would make things faster!

danielballan commented 8 years ago

Oops, yes of course. pours first cup of coffee, hurriedly

Going from 0 NaNs to at least 1 NaN requires more passes through the data in numpy's implementation, per @tacaswell.

stuwilkins commented 8 years ago

Yes this might be it. I think that when we have an image stack there are a number of safe assumptions, often there are more images that pixels, we always (or at least get_images) have a stack where the last two (shortest strides) dimensions are the actual image. This means the implementation becomes a lot cleaner. For example, the `stackmean in csxtools makes these assumptions and then just tests for NaNs:

https://github.com/NSLS-II-CSX/csxtools/blob/master/src/image.c#L91

Also fortunately the array is ordered in memory correctly for this approach. Looking at this function, it already does mean and sum, adding stddev would be fairly trivial (using the second moment method @tacaswell suggested) What else do you think we need? @vivekthampy ?

licode commented 8 years ago

Glad to see this implementation. Please provide more functions which are needed, then we can start adding them.

vivekthampy commented 8 years ago

@stuwilkins This works great for the most part so far. One issue that needs to be addressed is dealing with scans where the number of images per event are different. This can arise either if a scan is aborted, or if you want to process two different scans together which have different number of images per event. The second issue can be addressed by processing the scans separately and concatenating the image stacks. But that won't work in the first scenario.

At the moment, I deal with this by returning a list of 3D numpy arrays (one for each event), instead of a 4D numpy array. These are converted to a 3D stack using this function:

def convert_to_3d(images):
    if isinstance(images, list):
        ims = images[0]
        for im in images[1:]:
            ims = np.vstack((ims, im))
        return ims
    images = np.asarray(images)
    if images.ndim < 4:
        return images
    else:
        shape = images.shape
        return np.reshape(images, (shape[0]*shape[1], shape[2], shape[3]))

There might be a more efficient way to do this. But something like this is required to pass a 3D stack of images to a function which calculates G2 for instance.

Another useful function that you might want to add is specifying an ROI for the images. This can actually be implemented in the get_fastccd_images function to only process the specified ROI instead of the whole image and can allow processing much bigger stacks of images. For instance, I use the function below to truncate the images before passing it to the correct_images function (the "rotated" option is to account for the ROI being specified in the proper image orientation and not the rotated raw data format.).


def get_roi(images, roi=[0,0,960,960], default_roi=[0,0,960,960], rotated=False):
    if roi == default_roi:
        return images

    if rotated:
        roi = [960-roi[3], roi[0], 960-roi[1], roi[2]]

    if images.ndim == 3:
        images = images[:,roi[0]:roi[2], roi[1]:roi[3]]
    else:
        images = images[roi[0]:roi[2], roi[1]:roi[3]]

    return images

Also, it will be good to have a few simple functions like ccdtotal_per_event, ccdmean_per_event, stackmean_per_event and stacksum_per_event for quick plotting of motor scans etc.

tacaswell commented 8 years ago

There is a streaming implementation of G2 which can take chunks at a time.

sameera2004 commented 8 years ago

@vivekthampy so you are referring to one time correlation? Can you clarify more 3D stack of images? We have both one time g2, and two time g2, which calculate both g2 for instance

NSLS-II / wishlist

Add 'bottleneck' to the analysis env #87