NeurodataWithoutBorders / nwb-schema

Data format specification schema for the NWB neurophysiology data format
http://nwb-schema.readthedocs.io
Other
52 stars 16 forks source link

Storing motion correction displacements efficiently #228

Open bendichter opened 5 years ago

bendichter commented 5 years ago

I am helping the Losonczy Lab convert their calcium imaging data and I am having trouble with storing motion corrected images. Their software, SIMA, never actually stores the motion corrected images, but instead stores the displacements and computes the corrected images on the fly:

https://github.com/losonczylab/sima/blob/4bc7de8f5d3e520faa92d677fb8fb57515b3d2d2/sima/sequence.py#L1068-L1088

I actually think this is a better approach, because it takes up much less space. The displacement array is very sparse, and when it does have values they are ints and are repeated, so gzip will be very effective. As shown above, the code to apply this alignment is simple and we can make it even simpler by requiring the shape of the displacement array to be the same shape as the data array. NWB does have the ability to store this data in ophys.CorrectedImageStack.xy_translation, however this object also requires the original ImageSeries (which I suppose can be linked) and the corrected ImageSeries (which cannot). This is a huge data overhead that it seems silly to require. It would necessitate doubling the size of data just to communicate how frames are shifted.

Here is how I recommend we fix this in a minimal, non-breaking way:

bendichter commented 5 years ago

@ageorgou

ageorgou commented 5 years ago

Hi @bendichter. Did you mean to tag me? I've no connection to the Losonczy Lab, I'm afraid! (that said, your solution seems reasonable...)

bendichter commented 5 years ago

@ageorgou Yes I did, you work with Ca imaging, right? Do you have the same issue? I was hoping to see if this would work across multiple labs. How do you store motion correction data?

ageorgou commented 5 years ago

The lab I'm working with have some imaging data, but we don't do motion correction as far as I'm aware. I'll ask around and report back if I'm forgetting something.

EDIT: Motion correction is currently handled by the capturing software, so we don't do or record any processing - sorry I can't offer any input but thanks for asking!

oruebel commented 5 years ago

@nclack can you have a look at this proposed schema change. If I remember correctly ScanImage was also dealing with this (or maybe I'm wrong).

bendichter commented 5 years ago

@ageorgou

EDIT: Motion correction is currently handled by the capturing software, so we don't do or record any processing - sorry I can't offer any input but thanks for asking!

That is useful input :-)

So where are we on this? Can we make this field optional? It's currently a roadblock for one of my datasets

bendichter commented 5 years ago

@nicain, I believe you are also dealing with this issue?

nicain commented 5 years ago

Yes; I am having two problems:

1) We publish neither the "original" nor the "corrected" ImageSeries. This is because the size of the resulting file would be far too much for us to serve practically. Instead, we serve these data separately via S3. Because of this, my request would be that "original" and "corrected" be optional.

That being said, I can imagine the following reasonable push-back: Is it really an "image stack" if you havent provided the images? I think not. So an alternative would be to accept some sort of URI (file system path, or web URL) to redirect the interested user to the resource another way. In this case, I would still request "original" be optional, so long as the translation information used to compute the "corrected" is required. But then again, others will reasonably argue that the "original" be required, so long as the translation information needed to motion-correct this original source is required. Between these two options ("original" required but not "corrected", vs. "corrected" required but not "original"), ultimately, I think it is a philosophical debate that needs to be resolved: Is an NWB file a record of the experiment (and so "original" required and "corrected" not), or is an NWB file a record of a finished product including the provenance needed to compute that final state (and so "corrected" required and "original" not). The only way to satisfy both viewpoints is to make BOTH optional, which is my final recommendation (and will support me use-case the best).

2) xy_translation is supposed to be timeseries, but is really should be two separate timeseries. Of course one can make a data attribute on a single timeseries that is 2xN (N is the number of timestamps), but then there is no way to say whether the 1st "row" is "x" and the 2nd "row" is "y", or vice-versa. My recommendation is to split this into attributes on CorrectedImageStack: x_translation and y_translation. I strongly recommend this approach.

tldr; make "corrected" optional, "original" optional, and split xy_translation into x_translation and y_translation

Sorry for the novel, and thanks for all the hard work!

oruebel commented 5 years ago

So an alternative would be to accept some sort of URI (file system path, or web URL) to redirect the interested user to the resource another way

This is something we have on the agenda as part of the NIH grant. However, I'd expect this to be more on the 6-12month timescale. Doing this is non-trival because you have to standardize a couple of components here: 1) how you store this sort of foreign field in the file, 2) what you support for transfer of the data (e.g., do you ship it as JSON, HDF5, BSON etc.), 3) how does the web-API look like for evaluating the requests etc, .

nclack commented 5 years ago

@oruebel I realize I'm getting on this a little late but I can chime in wrt ScanImage metadata. We store possibly-3d translations with respect to a reference stack/image. The image/volume data we store is uncorrected. We use the motion estimates to stabilize targeting etc.