flimfit / FLIMfit

State of the art fluorescence lifetime imaging analysis software
http://flimfit.org
GNU General Public License v2.0
23 stars 14 forks source link

File format for FLIM #334

Open tritemio opened 7 years ago

tritemio commented 7 years ago

Hi,

I am the main author of Photon-HDF5, an open fast and space-efficient file format for timestamp-based spectroscopy data. Photon-HDF5 main use case involves smFRET and FCS data. We recently had a request for adding FLIM data support to Photon-HDF5. My main concern is understanding what other formats are people using for FLIM and whether introducing FLIM support in Photon-HDF5 would be worth the effort. Since I don't have first-hand experience with FLIM data, I was wondering if you can chime in and give your opinion.

In particular I'd be glad if you can comment on the following questions. What are the currently used file formats for FLIM, are there all vendor specific, are they self-contained (rich metadata, containing all the info needed for analysis, suitable for permanent archival)? Does OMERO supports well the timestamp-based FLIM data?

If you think adding FLIM support to Photon-HDF5 is worth it, what would be an analysis-friendly structure for FLIM data. Currently I was thinking of having a single array of timestamps & TCSPC time-lags for each Z plane. A "positions" array would give the 2D position (i.e. "pixel") of each timestamp. Ideas/comment?

Thanks!

EDIT This is not an issue with FLIMfit. If using GitHub for these kind of discussion is not appropriate this project, I am open to moving the discussion to a better forum.

seanwarren commented 7 years ago

Hi Antonino,

Happy to help. I read the PhotonHD5 paper with some interest when it came out.

What are the currently used file formats for FLIM, are there all vendor specific, are they self-contained (rich metadata, containing all the info needed for analysis, suitable for permanent archival)?

At the moment FLIM file formats are vendor specific and vary widely in terms of the metadata they include. I'm sure you are familiar with the main systems supporting timestamp-based imaging for FLIM, Becker and Hickl and Picoquant. Picoquant have a slightly different format for imaging data but I think it broadly follows a similar structure to their other formats. There are a number of other formats for histogrammed data (i.e. 3D cube of {t,x,y} with number of counts), notably some 'interesting' data formats from Lavision.

To turn a stream of photon events into a image marker signals are included in the event stream, e.g. frame clock, line start, line stop, pixel clock, generally generated by the microscope scanner. These signals can vary quite significantly from microscope to microscope, e.g. some use pixel clocks, some don't, some merge markers, some use linear scanners, some follow a sinusoidal pattern etc.

Does OMERO supports well the timestamp-based FLIM data?

OMERO currently has no support for timestamp-based FLIM data. The OME format has a 2D image plane as it's base format with support for ZCT stacks. It supports histogrammed FLIM data through a bit of a hack where the microtime axis is grafted into one of the ZCT dimensions supported by OMERO.

If you think adding FLIM support to Photon-HDF5 is worth it, what would be an analysis-friendly structure for FLIM data. Currently I was thinking of having a single array of timestamps & TCSPC time-lags for each Z plane. A "positions" array would give the 2D position (i.e. "pixel") of each timestamp. Ideas/comment?

This is an interesting question. The idea of a well documented format universal format with plenty of metadata certainly appeals to me - maintaining support for endless slightly different data formats in FLIMfit isn't hugely rewarding work!

As far as I am aware, however, all FLIM analysis software (certainly FLIMfit) work with histogrammed data in one way or another. This is true for fitting and phasor analysis based approaches. Therefore any analysis would want to histogram the photon data before analysis so the advantage to having an 'intermediate' photon format between the data and the format for analysis and isn't immediately obvious to me.

Something to bear in mind here is file size; you get a lot of photon/marker events in a typical FLIM image. B&H and Picoquant use 32bit events and many of our single images are several hundred megabytes. If you are acquiring time courses or z-stacks this can quickly add up. As far as I understand Photon-HDF4 uses a 64 bit format for the photon events, add (say) 16bit for (x,y) position and you will triple the file size immediately. It might be better to keep the use of 'marker' events, and instead include more substantial metadata about what markers to expect and how they should be interpreted, something that is often implicit at best.

I hope that is useful, if this is something you'd like to follow up on I have some C++ code for interpreting timestamped FLIM data from B&H and Picoquant which supports a pretty wide range of microscope systems at https://github.com/flimfit/FLIMreader. This might be a useful starting point at least and I'd be discuss some more of the details with you.

imunro commented 7 years ago

Hi Antonino

In addition to the vendor-specific TCSPC formats described above we have some users who use a gated/wide-field technique (effectively a very high-speed shutter) this data is stored as OME-TIFF.

To answer your OMERO question: FLIMfit is built with the OME Bio-Formats library. This supports Becker & Hickl's (BH) .sdt and Picoquant's .bin format for histogrammed data as well as the OME-TIFF format so these formats are well-supported by OMERO. The standard OMERO clients are not always optimal for viewing FLIM data, however, which is where FLIMfit fits in.
There is also support in BIo-Formats for BH's non-histogrammed .spc format although this has, so far been only minimally tested due to a lack of test data,

This leaves a third vendor, LaVision BioTec. Their Imspector software can save data either in their own .msr format (supported by Bio-Formats & therefore OMERO but not for all versions) or as OME-TIFF (which LaVision recommend that you use). Unfortunately they use an old version of the OME-TIFF standard that predates extension to more than 5 Dimensions & cannot , therefore handle FLIM. These files can, therefore, be imported and viewed in OMERO but will not be recognised as FLIM data either by the standard OMERO clients or by FLIMfit, when used as an OMERO client.

tritemio commented 7 years ago

Thanks @imunro and @seanwarren for your extensive answer, it is very helpful indeed! Now I have a fuller picture. We are about to release Photon-HDF5 v0.5 (weeks-1month), I think that any possible FLIM support is at best premature at this point.

You are right about FLIM timestamped data being very large. At this point seems that timestamps would give no immediate advantage for the FLIM analysis, but could allow future reanalysis with new methods. Maybe lifetimes with much less photons can be used if the histogramming is skipped (you could fit an MLE model or using phasor analysis on TCSPC timelags for example).

So, let's say, there is a potential advantage.

Let's talk about implementation. Photon-HDF5 uses separate arrays for timestamps and TCSPC time lags. Thanks to the efficient compression in HDF5, files are smaller on disk than original STD or PTU files. As a rule of thumb I've seen that 64-bit timestamps use 4-5x less space on disk (like a ~16bit or less). You always load only a portion of the array in RAM so you can work with larger that RAM datasets and decompression happens transparently with very little performance hit or even with better performance that no compression (depending on disk speed and type of compression). The advantage of "unpacked" arrays is simplicity of use. All the custom rollover corrections (that are specific for each input format) are already applied and the data is ready to use.

Also for FLIM data I would use "unpacked" arrays plus compression. The choice of the compression algorithm will set the tradeoff between disk space and execution speed. There are a bunch of fancy compressors to try. What I know bets is Blosc:

But a quick search also gave:

Integer compressors for integer arrays is an active research topic in CS and I think it would be useful to survey some of the recent algorithms.

Bottom line. We would need some test data to play with to understand the trade-off between space and speed.

seanwarren commented 7 years ago

No problem. Yes, you could probably get back to something similar to the original file size with compression (although if you only have 16bits of time resolution it would seem easier to store it using a 16bit timestamp?). I agree that it is useful to do the "unpacking" as a separate step.

If you would like any test data please let me know, I have files from quite a variety of systems people have sent me to test with FLIMfit.