Suport 3D FLIM data - Githubissues

tritemio commented 7 years ago

As for @talaurence request, we should think of how to handle 3D FLIM data.

Initially we thought that this case could be covered by a single photon_data group and then having (x, y, z) labels. But in the 3D case having the whole 3D data in a single array is not convenient. I think it would make sense to have a photon_data group for each z, and then add an array of (x, y) positions to reconstruct a slice from a single array of timestamps/nanotimes.

We could use the detectors array to give the (x, y) coordinates or we can define a new array (coordinates, positions?). I tend to prefer the latter option of adding a new array. But we have to think if this case needs some special configuration in /setup.

tritemio commented 7 years ago

More thoughts. For scanning data, adding the number of rows and columns in /setup would allow using a simple 1D array in detectors.

Let's say we have a 256x512 image. Each pixel can be identified by a linear index index = np.arange(131071) ([0..131071]). Then, the position of the pixel is retrieved from index.reshape(nrows, ncols). Any 2D mask, or ROI can be applied to the reshaped index and will provide the pixel selection.

From there, building a mask for detectors which selects multiple pixels is trivial, even though maybe not the most efficient way.

tritemio commented 7 years ago

We may want to introduce provisional support for FLIM data in Photon-HDF5 v0.5. As we gain experience with it, we can see if there are any issues and adjust the format in a subsequent version.

To this goal the proposal in my second comment, i.e. using /photon_data/detectors and add nrow, ncols to /setup seems the easiest as it requires only minimal modification to the specs.

tritemio commented 7 years ago

I propose adding:

/setup/scanned (optional), True only if measurement implement some scanning (confocal scanning, sample scanning, etc...)
/setup/scan_nrows: number of rows in a scan. Only present if /setup/scanned = True
/setup/scan_ncols: number of columns in a scan. Only present if /setup/scanned = True

Then, if data contains several Z planes, each plane will be in a different /photon_dataX/ group. The detectors will be numbered linearly from 0 to num_pixels - 1. The 2D position of each "pixel" can be retrieved with reshaping as in the previous comment.

We should also encode the step in X-Y-Z, for example with the tuple:

/setup/scan_delta_xyz

seb5g commented 6 years ago

Hello, I'm starting to build an acquisition system and analysis tools for TCSPC data towards FLIM type acquisition (among others, by the way I'm actually forking the project to add reading of picoquant phu files (histograms) and ptu files in T2 mode and minor corrections on the current T3 mode reading). I've been following photon-hdf5 format so far but I'm at the point where it should be best to share your final thought on FLIM data saving. I've read your post on flimfit (https://github.com/flimfit/FLIMfit/issues/334) and was wondering if you made any progress so far? I'm actually ok to start building some test file format. Here are my thoughts:

there is no format support for a file containing directly histograms (such as phu file from picoquant) because data are already preprocessed from the timestamps during acquisition.
as a corollary, hdf5photon format is not at the moment adequate to perform FLIM analysis as this requires histograms for each pixels (except if data analysis first process the raw timestamps, but then no more need to keep the actual timestamps...even if it is always good to keep raw data somewhere)

My conclusion on this is that one should keep the raw data as is it now (with extra metadata to guess what pixel or line each timestamp is related to), but then add a group such as "histograms" where histograms (from phu file of FLIM) could be stored (temporary or not). In the same way, I would add an analysis group where images extracted from lifetime (for instance) could be stored with the metadata of the analysis...this is maybe going too far.

Finally, I'm working as well at the interface of electron microscopy and spectroscopy where datacubes are often generated (with often a spectrum (photon or electron energy) for each pixel). This exactly corresponds to the issue of FLIM data. There is an extensive project currently running on analysis of such datacubes (hyperspy) mostly devoted to electron microscopy but with numerous general tools dealing with datacubes analysis (fitting, filtering...). Maybe it could be interesting to see what could be done such as a possibility in hyperspy to load hdf5photon raw data and then to process the datacube there...

tritemio commented 6 years ago

Hi @seb5g, thanks for reaching out! I totally support the idea of finding synergies between different projects and applications. No, I have not worked on FLIM support except for laying down some preliminary ideas here.

You are right, we designed Photon-HDF5 for storing per-photon data. So, currently, there is no place to save TCSPC histograms. Similarly, in FCS, people compute and analyze the cross-correlation (ACF/CCF) but we don't store those curves. It is certainly true that most FLIM analysis is done on histograms but you can in principle do phasor analysis directly on timestamps. On the other hand datasets can be big so some people don't store timestamps at all.

We view Photon-HDF5 as a long-term storage of raw data that does not change often. For this reason we have not added support for processed data (e.g. histograms, or burst data in smFRET for example). While the per-photon data can fit in a common structure for a large range of measurements, processed data is very specific for each type of measurement and is difficult to support in a common format.

So, in terms of supporting FLIM in Photon-HDF5, I think we should focus on storing the raw timestamp data with enough metadata around it, as you said. I am totally open to suggestions on how to structure the layout in the most effective way.

I am not sure about the TCSPC histograms. It may be of potential interest also for other TCSPC applications. So, I may be convinced to accept a general non FLIM specific proposal to add TCSPC histograms (maybe a 2D array of "nanotimes" histograms in each photon_data group?) to Photon-HDF5.

For the images, I would put them in a second HDF5 file, probably together with histograms. You can keep these two files with the same name and different suffixes (e.g. ___.ph5.hdf5 and ___.flim.hdf5). If you think about data sharing, somebody interested in the histograms/images is maybe not immediately interested in the timestamps, so it is convenient to download the two files separately.

The "processed FLIM" format and Photon-HDF5 can be made inter-operable. For example they can share fields in sample, identity, provenance or description and acquisition_duration and (potentially) "histograms". Copy of metadata between the two can be automated. HDF5 allows links to datasets in a different file. So, as long as the files are kept together, both processed dataset can allow accessing all the raw data in the Photon-HDF5 file.

I had a look at hyperspy, it is a very interesting project. I see how it can be very useful in many applications. As you said, hyperspy should preprocess the Photon-HDF5 data in order to make it fit in their data model.

Regarding adding more file support to phconvert, I'm looking forward to your PR! 👍

talaurence commented 6 years ago

I am still interested in adding support for FLIM to Photon-HDF5. I would like to participate in any discussion regarding the required metadata. With regard to the histograms, we could take the following approach. If there is no existing standard, we could develop a convenient way to organize the histograms within an HDF5 file, keeping it separate from the Photon-HDF5 specification. We could keep it in a separate file at this point. Once we feel it is ready and useful, we can determine whether it is better to keep it in a separate file or add it to the Photon-HDF5 specification.

tritemio commented 6 years ago

@talaurence, sounds like a good plan.

If you like, we can create new a repo within the Photon-HDF5 organization to draft these specs.

talaurence commented 6 years ago

For a separate lifetime imaging project, I am using the HDF5 file format via PyTables. The size of the data cube is 1024X1024X4096 (x pixels, y pixels, lifetime histogram). I placed the entire data cube in a "carray", and I can access any part of it at will. I did not clip out any zeros - I just put the whole array in. Using the "blosc" compression filter, the size of the data file went from 8 Gbytes to ~900 MBytes, which is much more manageable. I think we can keep this format fairly simple: let's just make one big array, and let the HDF5 library do the hard work. We can add some metadata definitions, and leave it at that.

talaurence commented 6 years ago

Yes - let's create a new repo.

seb5g commented 6 years ago

I'm not at all sure it is wise to keep one big cube of data. The way an analysis is done is by extracting info from the 1D traces. In previous work, I was storing these as such with info on the related pixel in the name of the 1d trace. That made exploration of data easier like in a tree view such as hdfview, setting up analysis on1 trace and then looping over the pixel to reconstruct an image. I now think the pixel should not be absolutely coded in the name but probably in a line and col metadata Sébastien

Le 6 déc. 2017 01:18, "talaurence" notifications@github.com a écrit :

For a separate lifetime imaging project, I am using the HDF5 file format via PyTables. The size of the data cube is 1024X1024X4096 (x pixels, y pixels, lifetime histogram). I placed the entire data cube in a "carray", and I can access any part of it at will. I did not clip out any zeros - I just put the whole array in. Using the "blosc" compression filter, the size of the data file went from 8 Gbytes to ~900 MBytes, which is much more manageable. I think we can keep this format fairly simple: let's just make one big array, and let the HDF5 library do the hard work. We can add some metadata definitions, and leave it at that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Photon-HDF5/photon-hdf5/issues/41#issuecomment-349486333, or mute the thread https://github.com/notifications/unsubscribe-auth/AGTcI6hqIHARr-0kxdgFAd2MtDiaBxIpks5s9d1DgaJpZM4OgBg0 .

talaurence commented 6 years ago

I am not sure I understand the problem with a large data cube. If we do not use that format, we will have hundreds or thousands of arrays and groups within the file with some sort of code for rows and columns, making it much more confusing. The HDF5 format was created in part in order to deal with large data sets such as this. Storing the entire array on disk in a single array does not mean that one has to load it into memory that way.

For example, if you want a lifetime histogram of a single pixel, the following command would retrieve it from the file:

lifetime_hist = self.lifetime_image[row,col,:]

If you want a whole row:

row_hists = self.lifetime_image[row,:,:]

If you want to get a lifetime histogram summed over a region:

    sub_array = self.lifetime_image[row_min,row_max,col_min:col_max,:]
    lifetime_plot = np.sum(sub_array,axis=(0,1))

I think this would be a much cleaner storage format.

seb5g commented 6 years ago

Well, said like this, I kind of agree, but imagine now you want to scan over a region that is not a square or rectangle but a given set of representative pixels or an union of smaller area, then your description is lost. In my sense, the spirit of such a thing as a unified file format is to be able to expand on things/applications we didn't foresee at the beginning. In the case of FLIM I do agree mostly it will be images but I want to use that format for all hyper-spectral (or temporal) kind of signal and applications (I have three different examples right here in my lab).

2017-12-06 14:30 GMT+01:00 talaurence notifications@github.com:

I am not sure I understand the problem with a large data cube. If we do not use that format, we will have hundreds or thousands of arrays and groups within the file with some sort of code for rows and columns, making it much more confusing. The HDF5 format was created in part in order to deal with large data sets such as this. Storing the entire array on disk in a single array does not mean that one has to load it into memory that way.

For example, if you want a lifetime histogram of a single pixel, the following command would retrieve it from the file:

lifetime_hist = self.lifetime_image[row,col,:]

If you want a whole row:

row_hists = self.lifetime_image[row,:,:]

If you want to get a lifetime histogram summed over a region:
sub_array = self.lifetime_image[row_min,row_max,col_min:col_max,:]
lifetime_plot = np.sum(sub_array,axis=(0,1))
I think this would be a much cleaner storage format.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Photon-HDF5/photon-hdf5/issues/41#issuecomment-349640018, or mute the thread https://github.com/notifications/unsubscribe-auth/AGTcI0kSVaOU12WTJheo7cjLHZgbBnX3ks5s9pbigaJpZM4OgBg0 .

tritemio commented 6 years ago

I agree with @talaurence. Once you have the array you can apply advanced indexing to select ROI in any dimensions you want (numpy advanced indexing allows using boolean masks or list on indices, for each dimension). The alternative of having milions of 1D arrays does not scale up, even performance wise. These datasets could be loaded not only as numpy arrays but also as hyperspy datasets or xray arrays so that dimensions are labeled.

I have hard time imagining an application where saving the TCSPC histograms as N-D arrays (with appropriate chunking) would not be convenient. @seb5g, can you explain your application and why would not fit this model?

seb5g commented 5 years ago

I'm back on the subject as I finally reached the point where I will have to save custom FLIM data. Concerning the discussion above, I'm a bit lost. If we follow the first specifications you derived in this issue, only photons are saved and no (reconstructed) FLIM data. I think it to be the best. Indeed, to reconstruct FLIM one need to specify a time_bin that is not always the minimal possible. So this is up to the guy processing the data.

If I summarize, you propose to add:

/setup/scanned (optional), True only if measurement implement some scanning (confocal scanning, sample scanning, etc...)
/setup/scan_nrows: number of rows in a scan. Only present if /setup/scanned = True
/setup/scan_ncols: number of columns in a scan. Only present if /setup/scanned = True
/setup/scan_delta_xyz: tuple for the steps

That said, did you specify the new array containing the linear index representing the "position" in the image. I think you said we should create a new array but then you mentioned the detector array. This last would collide with other detector definition. So why not adding:

detectors/linear_index: linear index within the image (you get (X, Y) position from the image shape)

marktsuchida commented 4 years ago

We're working on an open source framework for laser scanning microscopy, including FLIM (not released just yet, but here is the overview: https://loci.wisc.edu/software/openscan).

I agree with the view that FLIM histogram data, if stored, should go in a separate format. It has many more requirements (e.g. it may be organized in a time series and/or Z stack, or have other dimensions), and it is much more similar to microscopy image data than to (TC)SPC timestamp data.

But it would be great if Photon-HDF5 could store the photon timestamp data from a FLIM experiment without losing information. The most important missing feature, if I am not mistaken, is the ability to store "marker" events. (It seems to be the only fundamental feature missing compared to BH .spc files and PicoQuant .ptu files, aside from metadata details.)

In a typical LSM TCSPC-FLIM system where the laser scanning is performed by a separate device, the scanner sends some combination of frame, line, and pixel "clock" or "marker" signals (not necessarily all 3) to the TCSPC device, which records the timestamps of these events along with the photons. These are used (together with user-configurable delay adjustments and other parameters) to assign photon events to pixels and to exclude photon events occurring during horizontal and vertical retrace.

I think it makes a lot of sense to store the hardware-generated marker timestamps, rather than interpreted pixel locations of photons, because in theory the latter can be adjusted in postprocessing (e.g. to correct for scan phase). And it's always good to keep the raw data in case one needs to investigate possible bugs in processing code: even just assigning photons to pixels has a lot of edge cases, especially when trying to support a wide range of hardware setups!

I see that the "latest" documentation mentions in passing recording markers as special detectors, but as far as I can tell it is not yet specified how to indicate in the file that a given detector is a marker channel. (I would appreciate pointers if I'm missing something.) Also, I'm not sure if it is a good idea to store marker timestamps in the same array(s) as photons, because that would mean any code reading files will need to know about markers. In addition, markers don't have nanotime even if the photons do. It might be cleaner to store marker timestamps in a separate, optional, array, or set of arrays. This has the additional advantage of making it easier to write code that processes photons according to markers, especially if the time offset between markers and photons can range from positive or negative.

Once we can record marker timestamps, it will probably make sense to record the necessary metadata to assign photons to FLIM pixels: this may include pixels-per-line, lines-per-frame, pixel rate (critical if pixel marker not used), line delay (or time offset between markers and photons), etc. This way, it will be possible to generate FLIM histograms without any external data.

But just being able to record markers might allow me to stop writing .spc files.

I don't have anything against also supporting recording the pixel coordinates of each photon; it may be quite useful in some workflows.

Would there be any interest if I were to come up with a more concrete proposal for how to record marker events? Or are there any existing plans for this that I didn't find?

seb5g commented 4 years ago

I've included picoquant hardware in an open source framework called pymodaq http://pymodaq.cnrs.fr that stands as Modular Data Acquisition with Python. Every hardware can also be included as a light plugin. I think lsm falls within this category as i use pymodaq for Flim and many other type of experiment. Maybe we could build something out of your soft and mine?

Le mer. 20 nov. 2019 20:11, Mark Tsuchida notifications@github.com a écrit :

We're working on an open source framework for laser scanning microscopy, including FLIM (not released just yet, but here is the overview: https://loci.wisc.edu/software/openscan).

I agree with the view that FLIM histogram data, if stored, should go in a separate format. It has many more requirements (e.g. it may be organized in a time series and/or Z stack, or have other dimensions), and it is much more similar to microscopy image data than to (TC)SPC timestamp data.

But it would be great if Photon-HDF5 could store the photon timestamp data from a FLIM experiment without losing information. The most important missing feature, if I am not mistaken, is the ability to store "marker" events. (It seems to be the only fundamental feature missing compared to BH .spc files and PicoQuant .ptu files, aside from metadata details.)

In a typical LSM TCSPC-FLIM system where the laser scanning is performed by a separate device, the scanner sends some combination of frame, line, and pixel "clock" or "marker" signals (not necessarily all 3) to the TCSPC device, which records the timestamps of these events along with the photons. These are used (together with user-configurable delay adjustments and other parameters) to assign photon events to pixels and to exclude photon events occurring during horizontal and vertical retrace.

I think it makes a lot of sense to store the hardware-generated marker timestamps, rather than interpreted pixel locations of photons, because in theory the latter can be adjusted in postprocessing (e.g. to correct for scan phase). And it's always good to keep the raw data in case one needs to investigate possible bugs in processing code: even just assigning photons to pixels has a lot of edge cases, especially when trying to support a wide range of hardware setups!

I see that the "latest" documentation mentions in passing recording markers as special detectors, but as far as I can tell it is not yet specified how to indicate in the file that a given detector is a marker channel. (I would appreciate pointers if I'm missing something.) Also, I'm not sure if it is a good idea to store marker timestamps in the same array(s) as photons, because that would mean any code reading files will need to know about markers. In addition, markers don't have nanotime even if the photons do. It might be cleaner to store marker timestamps in a separate, optional, array, or set of arrays. This has the additional advantage of making it easier to write code that processes photons according to markers, especially if the time offset between markers and photons can range from positive or negative.

Once we can record marker timestamps, it will probably make sense to record the necessary metadata to assign photons to FLIM pixels: this may include pixels-per-line, lines-per-frame, pixel rate (critical if pixel marker not used), line delay (or time offset between markers and photons), etc. This way, it will be possible to generate FLIM histograms without any external data.

But just being able to record markers might allow me to stop writing .spc files.

I don't have anything against also supporting recording the pixel coordinates of each photon; it may be quite useful in some workflows.

Would there be any interest if I were to come up with a more concrete proposal for how to record marker events? Or are there any existing plans for this that I didn't find?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Photon-HDF5/photon-hdf5/issues/41?email_source=notifications&email_token=ABSNYI7DJPLEH6IOBNHDERLQUWDVRA5CNFSM4DUADA2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEET6MIQ#issuecomment-556262946, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSNYI3T4PE3ODL6FTSVNITQUWDVRANCNFSM4DUADA2A .

talaurence commented 4 years ago

Would there be any interest if I were to come up with a more concrete proposal for how to record marker events? Or are there any existing plans for this that I didn't find?

I would be interested in a proposal for adding marker arrays to the photon-hdf5 format. I agree with your assessments.

I have also been developing software for FLIM applications using Picoquant hardware. So far we have been performing sample scanning, but we plan to move on to laser scanning soon. I would be interested in comparing your software with mine. I can put mine on GitHub, but there is an approval process I need to go through here. I will start that process and put it on GitHub as soon as I can. There is a "ScopeFoundry" microscope development software based on Python from LBNL. I believe they have some options for Picoquant hardware as well. That may be of interest to all of our efforts as well.

marktsuchida commented 4 years ago

Glad to hear I'm not the only one interested in this. Our code is in C and C++ and we've only used a Becker & Hickl device so far, although I have studied the PicoQuant API and data format. I am contemplating a separate library for streaming TCSPC data handling (including live histogramming), but I don't know how far I'll pursue that.

I'll open a separate issue with my proposal for storing marker timestamps, since it feels a little diverged from the title of this issue.

Photon-HDF5 / photon-hdf5

Suport 3D FLIM data #41