NSLS-II / wishlist

an issue tracker for the big picture
1 stars 0 forks source link

Implement a range of lazy-loading options for EigerHandler #74

Closed danielballan closed 8 years ago

danielballan commented 9 years ago

Quoting an email from @yugangzhang

But, it will be very helpful if you can modify the eiger code in the following way.

Now:

e.g., from eiger_io.pims_reader import EigerImages as Images
hdr = db[...]
img = Images(...)

For some case, Images will load a whole data. You will understand this problem when you can try to load data: uid= 95ec94a7-e74e ..., which was collected last Friday night (10/16/2015).

Immediate reactions:

  1. @yugangzhang, I never intended you to from eiger_io.pims_reader import EigerImages as Images. Why are you doing that? There is from databroker import get_images.
  2. The only change one needs to make is in the handler code. The pims object is cast to a numpy array by a call to np.array. Simplify removing that call will do what I think @yugangzhang wants.
ericdill commented 9 years ago

@danielballan The shape of the data that @yugangzhang is getting from filestore is 'shape': [3000, 2167, 2070],. the EigerImages pims reader is correctly returning the lazily loaded data set. The problem is that there are only two data points in the data set and each are 3000x2167x2070. I'm pretty sure that we would need to write a new handler that would lazily load each of the 3000 frames independently

If you already knew that, sorry :cry:

danielballan commented 9 years ago

OK, that sounds right. h5py makes it straightforward to load partial data sets, so we can certainly write a lazier handler.

yugangzhang commented 9 years ago

@ericdill that's what I meant.

danielballan commented 9 years ago

@yugangzhang See how far you get with this, reading partial data sets. Read the documentation for h5py.

yugangzhang commented 9 years ago

?

danielballan commented 9 years ago

@yugangzhang http://docs.h5py.org/en/latest/high/dataset.html#reading-writing-data

yugangzhang commented 9 years ago

Thanks! It helps.

danielballan commented 9 years ago

@sameera2004 reports that you are disappointed in the slowness of these two lines:

imgs = get_images(...)
imgs[0]

Just to be sure we're on the same page, this is slow because each individual frame of imgs is a huge cube, ~ 1000^3 pixels, so even accessing the very first frame is costly.

I suggested a LazyEigerHandler that can load partial frames. I think you, @yugangzhang, can lead the way on this. I'm happy to provide more guidance as needed.

yugangzhang commented 9 years ago

I will go to this issue soon.

tacaswell commented 9 years ago

@heroux This is also of interest to the *MX folks who have eigers, please forward this to the correct people on those beamlines.

danielballan commented 9 years ago

I wrote a working prototype that @sameera2004 has in some of her notebooks. We will give it a permanent home in the NSLS-II/eiger-io repo.

Sameera, I originally said I would do this but I'm swamped. Any chance you could make a PR out of the code in your notebook? It can be pasted as-is, I think, into eiger_io/fs_handlers. Then I will take a look at it.

sameera2004 commented 9 years ago

@danielballan sure I will create a PR

sameera2004 commented 9 years ago

@danielballan I created a PR#4 in eiger_io/fs_handlers https://github.com/NSLS-II-CHX/eiger-io/pull/4

cowanml commented 9 years ago

On Wed, 4 Nov 2015, Thomas A Caswell wrote:

@heroux This is also of interest to the *MX folks who have eigers, please forward this to the correct people on those beamlines.

probably not :(

Existing MX packages all take a file prefix/pattern and read directly from the filesystem. Unless we're able to exceed the performance of whatever san/nas filesystem is used, it would just be slowing things down.

New MX packages are starting to look at being able to consume streams, but the point of that is to get a stream directly from the detector to minimize latency.

Only place I see this maybe benefiting MX is in the backend of the SynchWeb knockoff? But that'll be fetching jpg's of images via FileStore?

-matt

danielballan commented 9 years ago

This is an issue about handlers for images from Eiger detectors. We could use filestore to index a collection JPEGs, but that would use a JPEG handler, nothing to do with Eiger, right?

cowanml commented 9 years ago

uh, I think so?

I was just responding about the lazy loading stuff being interesting for MX.