Closed danielballan closed 8 years ago
attn @ericdill
What is "bottleneck"?
https://pypi.python.org/pypi/Bottleneck
I think conda distributes it.
Sent from Outlook Mobilehttps://aka.ms/qtex0l
On Wed, Feb 24, 2016 at 2:48 PM -0800, "Eric Dill" notifications@github.com<mailto:notifications@github.com> wrote:
What is "bottleneck"?
— Reply to this email directly or view it on GitHubhttps://github.com/NSLS-II/wishlist/issues/87#issuecomment-188494642.
Really sure you want to use it .....
https://nbviewer.jupyter.org/gist/stuwilkins/beb8eefcea332a0f24f4
@vivekthampy most of the routines you will need are now in csxtools.
Your benchmarks show bottleneck ~20X slower than numpy. I wonder if something went wrong there. I ran bottleneck's own benchmarks bottleneck.bench()
on srv1 just now. Results:
# Python 3.5
In [1]: import bottleneck as bn
In [2]: bn.bench()
Bottleneck performance benchmark
Bottleneck 1.0.0
Numpy (np) 1.10.2
Speed is NumPy time divided by Bottleneck time
NaN means approx one-third NaNs; float64 and axis=-1 are used
no NaN no NaN NaN NaN
(10,) (1000,1000) (10,) (1000,1000)
nansum 37.6 5.3 38.4 9.1
nanmean 158.8 6.6 162.3 10.7
nanstd 284.4 4.0 272.7 7.9
nanvar 265.9 4.0 266.5 8.1
nanmin 33.4 1.0 33.2 1.4
nanmax 32.9 1.0 35.8 2.1
median 56.4 0.9 75.3 1.1
nanmedian 63.4 4.2 71.6 9.3
ss 15.4 2.7 15.9 2.7
nanargmin 65.8 6.6 66.4 11.9
nanargmax 67.2 6.6 65.6 13.0
anynan 13.6 0.7 14.1 73.6
allnan 14.3 72.9 14.3 67.5
(etc.)
Obvious observations: They are testing 1e6 numbers; you are testing 1e9. The shapes are also different. Could that really explain such a large discrepancy?
@danielballan Its a factor of 2 slower not 20 .... 4.8 seconds vs 11 seconds.
I am no expert, not sure. However it is hard to thing how to screw that up in the test. I find it interesting that the speed increase drops dramatically from nanmeen
from (10,)
to (1000,1000)
. Given there is a big difference between there being NaNs
and not NaNs
I wonder how they implement it, I would instinctively thing that having NaNs
would make things faster!
Oops, yes of course. pours first cup of coffee, hurriedly
Going from 0 NaNs to at least 1 NaN requires more passes through the data in numpy's implementation, per @tacaswell.
Yes this might be it. I think that when we have an image stack there are a number of safe assumptions, often there are more images that pixels, we always (or at least get_images
) have a stack where the last two (shortest strides) dimensions are the actual image. This means the implementation becomes a lot cleaner. For example, the `stackmean
in csxtools
makes these assumptions and then just tests for NaNs
:
https://github.com/NSLS-II-CSX/csxtools/blob/master/src/image.c#L91
Also fortunately the array is ordered in memory correctly for this approach. Looking at this function, it already does mean and sum, adding stddev
would be fairly trivial (using the second moment method @tacaswell suggested) What else do you think we need? @vivekthampy ?
Glad to see this implementation. Please provide more functions which are needed, then we can start adding them.
@stuwilkins This works great for the most part so far. One issue that needs to be addressed is dealing with scans where the number of images per event are different. This can arise either if a scan is aborted, or if you want to process two different scans together which have different number of images per event. The second issue can be addressed by processing the scans separately and concatenating the image stacks. But that won't work in the first scenario.
At the moment, I deal with this by returning a list of 3D numpy arrays (one for each event), instead of a 4D numpy array. These are converted to a 3D stack using this function:
def convert_to_3d(images):
if isinstance(images, list):
ims = images[0]
for im in images[1:]:
ims = np.vstack((ims, im))
return ims
images = np.asarray(images)
if images.ndim < 4:
return images
else:
shape = images.shape
return np.reshape(images, (shape[0]*shape[1], shape[2], shape[3]))
There might be a more efficient way to do this. But something like this is required to pass a 3D stack of images to a function which calculates G2 for instance.
Another useful function that you might want to add is specifying an ROI for the images. This can actually be implemented in the get_fastccd_images function to only process the specified ROI instead of the whole image and can allow processing much bigger stacks of images. For instance, I use the function below to truncate the images before passing it to the correct_images function (the "rotated" option is to account for the ROI being specified in the proper image orientation and not the rotated raw data format.).
def get_roi(images, roi=[0,0,960,960], default_roi=[0,0,960,960], rotated=False):
if roi == default_roi:
return images
if rotated:
roi = [960-roi[3], roi[0], 960-roi[1], roi[2]]
if images.ndim == 3:
images = images[:,roi[0]:roi[2], roi[1]:roi[3]]
else:
images = images[roi[0]:roi[2], roi[1]:roi[3]]
return images
Also, it will be good to have a few simple functions like ccdtotal_per_event, ccdmean_per_event, stackmean_per_event and stacksum_per_event for quick plotting of motor scans etc.
There is a streaming implementation of G2 which can take chunks at a time.
@vivekthampy so you are referring to one time correlation? Can you clarify more 3D stack of images? We have both one time g2, and two time g2, which calculate both g2 for instance
Requested by @vivekthampy
P.S. Now you know where the 'wishlist' repo is, Vivek. :- )