Photon-HDF5 / photon-hdf5

Photon-HDF5 Reference Documentation
http://photon-hdf5.readthedocs.io/
3 stars 3 forks source link

Support marker timestamps (for FLIM, etc.) #44

Open marktsuchida opened 4 years ago

marktsuchida commented 4 years ago

Based on discussion in #41, here is a proposal for supporting "marker" events generated by TCSPC hardware (sometimes called external markers). Among other things, this will allow storing all raw data needed for FLIM and related experiments.

This can be considered a "part 1" for supporting FLIM data. A "part 2" (not included here) would need to specify additional metadata fields to fully describe a FLIM experiment (such as: pixel rate, marker-to-scan-clock correspondence, and marker time offsets), but we first need a specification for storing the marker data itself.

Proposed additions:

Points that may need discussion:

I'm prepared to create a pull request if there is interest and agreement on the format.

smXplorer commented 3 years ago

@marktsuchida: This sounds like a reasonable suggestion. In answer to your questions/comments,

Now the main issue is that @tritemio has moved to greener pastures and probably does not support photon-hdf5 anymore. Maybe @talaurence can help with that?

talaurence commented 3 years ago

I agree that these are reasonable suggestions, and getting them done is aligned with my current research work. I noticed that the original post is a few months old. Let us know if you remain interested in pursuing these additions.

marktsuchida commented 3 years ago

Thanks. I still have an interest in this happening, but I'm not sure when I will be able to get back to it. Happy to help if the timeframe is not that urgent.

@smXplorer: I'm not 100% sure I understood your question about unused bits, but the devices I'm familiar with (or at least BH boards, which are fresher in my memory) do not have the ability to turn off any of the fixed number of (hardware) marker input lines. In other words, the bits in the vendor native format always map directly to the hardware input terminals (they may be scattered in a weird order across a 32-bit record, so that requires reordering). So, for a device that has 4 marker lines (0...3), we would map those to the 4 (=marker_num_bits) least significant bits of the marker_bits value (lsb = marker 0). If the user does not physically connect anything to, say, marker 2, then bit 2 will always be zero, but there is no need to indicate that the bit is unused (or to shift the higher bits right). In fact it would be hard to do otherwise, because software has no way of knowing whether an actual signal was connected to a given marker line, and we certainly don't want to require end-user input regarding this (thinking as an acquisition software writer who wants to avoid loss of raw data).

My thinking is that because we need additional metadata (to be specified separately; the "part 2" as I called it above) anyway to describe how the marker bits map to application-specific interpretations (pixel clock, line clock, etc., if the application is FLIM), it is not important how the bits are ordered and it is okay if (a reasonable number of) unused bits are included. Users of the file format could even choose to map to bits in a way that does not correspond to hardware lines (say, in order to follow some local standard), although personally I would prefer to minimize such transformations in my own work. This does mean that a specification for that additional metadata is needed before we can fully describe a FLIM experiment, though.

smXplorer commented 3 years ago

@marktsuchida Sorry if I wasn't clear (and I wasn't). I meant to ask whether you were proposing a "flexible" scheme where, if, say, 3 markers are needed for an app, bit 0, 1 and 2 will be used, no matter whether the vendor uses, say, bit 1, 2 and 4. If you wanted to stick to the vendor's choice, you would have "unused" bits, but a lazy reader would be able to always assume, say, that bit 1 is always this marker, bit 2 that marker and bit 4 that other marker, without having to bother reading the definition of the markers (i.e. assuming that it is just the vendor's convention).

The question comes from the discussion we had in the past about detector ID and as a matter of fact, re-reading that section of the specs: https://photon-hdf5.readthedocs.io/en/0.5.dev/phdata.html#setup-detectors-group, I realize that these "markers" were included in the discussion. In other words, a "marker" is a "signal" in the corresponding "detector". Please read that section and let me know if you understand it the way I do.

marktsuchida commented 3 years ago

@smXplorer Regarding the detector ID section, please see my comment preceding this issue at https://github.com/Photon-HDF5/photon-hdf5/issues/41#issuecomment-556262946. In summary, I feel that storing markers as a special case of a detector is inconvenient at best.

In this context I need to reiterate my earlier question about whether timestamps are monotonically increasing, or just monotonically non-decreasing. If I'm not mistaken, if detector 0 and detector 1 see a photon at the same time (i.e., within the timestamp counter period), the same timestamp is repeated in /photon_data/timestamps (with 0 and 1 in /photon_data/detectors), is it not? And as far as I can tell, there is no rule about the ordering of detectors in such a case.

I think that probably works well enough for photon timestamps (though a format guaranteeing strict monotonic increase would have been nice), but a similar scheme for markers would make it harder to write correct code that assigns photons to pixels, because the order of markers stored in a file may not be the order in which they need to be processed. And it gets even more complicated if the same array (/photon_data/timestamps) is used for both photons and markers.

Getting back to your first question about app-based bit assignment -- are you proposing something like bit 0 = pixel clock, bit 1 = line clock, and bit 2 = frame clock? Aside from the fact that you could still have unused bits (because any 1 of those 3 clocks is sufficient for recovering a FLIM image, and not all setups have all 3), the difficulty I see is that then it becomes hard for users to use this format to store additional or non-standard markers that are not (yet) provided by the format specification. Maybe they need to record the timing of some sort of experimental stimulus, or they are scanning in a non-raster pattern. Should they use bits 3 and above for that? But then the specification cannot evolve to introduce new standard markers in the future. If we use separate metadata to map the bits, non-standard markers could just be left unmapped (or maybe annotated with a human-readable string).

Note that I don't care if the bits follow any vendor convention. Which hardware line is used for pixel/line/other clock is entirely up to the user, at least outside of very specific cases (even though the BH Handbook makes it hard to see this). I just want the file format to be capable of storing all of the raw data even before any interpretation or rearrangement takes place. Among other things, this can help a lot while setting up an imaging system or troubleshooting acquisition software, because you can first record and save the data and then look at all of the markers at a glance. If the file format would only store the markers you tell it to, and the result is wrong, then you need to figure out if you misconfigured or the software is buggy or the cables are connected incorrectly, and the file format fails to be a tool in the process. That seems like a lost opportunity.

smXplorer commented 3 years ago

@marktsushida In this context I need to reiterate my earlier question about whether timestamps are monotonically increasing, or just monotonically non-decreasing. If I'm not mistaken, if detector 0 and detector 1 see a photon at the same time (i.e., within the timestamp counter period), the same timestamp is repeated in /photon_data/timestamps (with 0 and 1 in /photon_data/detectors), is it not? And as far as I can tell, there is no rule about the ordering of detectors in such a case.

This is correct. The reason, I believe, why the detectors where not ordered, is that this is how we got our particular multispot data, but there is no fundamental impossibility to enforce ordering. I get that this would be advantageous in the case of markers of the type 'frame', 'line' and 'pixel' for instance, although presumably they would be in the correct order in the original data, and should stay so in the HDF5 version. So in a sense, keeping in mind that you do not favor confounding 'detectors' and 'markers', your minimal suggestion would be to enforce 'marker' order. I am not sure that is quite possible to impose without running into the risk that some users will request the opposite or a different scheme (not mentioning backward compatibility).

The whole idea of this file format being to be as general as possible without bogging down users with too many arcane rules, adding a different type of data stream for those hardware markers sounds a bit counter-productive to me.

On the contrary, it would seem to me that it is convenient to have all the timestamps ordered within each photon_data group, the task of the reader being to dispatch them in separate buckets before processing them to their heart's content (rather than having to check two different types of arrays to figure out their respective relation).

I am not quite sure why having one time stamp tagged as "pixel", then "line" is such an impediment (it will generally not even be associated with a photon). If you find "pixel", just make sure that the subsequent identical timestamps, assuming there are any, do not contain a "line" or whatever tag of interest needs to be checked. In other words, collect all identical timestamps and act according to the collection of tags collected for this timestamp. Just a thought...

Getting back to your first question about app-based bit assignment -- are you proposing something like bit 0 = pixel clock, bit 1 = line clock, and bit 2 = frame clock? Aside from the fact that you could still have unused bits (because any 1 of those 3 clocks is sufficient for recovering a FLIM image, and not all setups have all 3), the difficulty I see is that then it becomes hard for users to use this format to store additional or non-standard markers that are not (yet) provided by the format specification. Maybe they need to record the timing of some sort of experimental stimulus, or they are scanning in a non-raster pattern. Should they use bits 3 and above for that? But then the specification cannot evolve to introduce new standard markers in the future. If we use separate metadata to map the bits, non-standard markers could just be left unmapped (or maybe annotated with a human-readable string).

I am leaving this on the back-burner at this time, since it doesn't apply if markers are just one type of 'detector'.

I just want the file format to be capable of storing all of the raw data even before any interpretation or rearrangement takes place. Among other things, this can help a lot while setting up an imaging system or troubleshooting acquisition software, because you can first record and save the data and then look at all of the markers at a glance. If the file format would only store the markers you tell it to, and the result is wrong, then you need to figure out if you misconfigured or the software is buggy or the cables are connected incorrectly, and the file format fails to be a tool in the process. That seems like a lost opportunity.

I understand your concern. If I understand you correctly, you would want to have a quick way of looking only at the tags of interest. I can offer you a simple solution within the current framework: create a photon_data group that only includes those tags, no photon timestamps. Would that work?

marktsuchida commented 3 years ago

@smXplorer Apologies for the long delay.

I think you have convinced me that it's better to put marker timestamps in the same array as the photon timestamps. I agree that the undefined ordering within elements at the same timestamp is a minor annoyance at most, and can be dealt with cleanly (esp. if the spec makes it clear that this is a requirement for readers).

Another issue was with the fact that nanotimes and particles will then contain unused elements, but that is also most likely not a problem (though the spec should probably say that zeros should be stored in unused elements, and also provide metadata indicating which pixel ids have valid nanotimes or particle ids).

I guess then it's a matter of extending detectors_specs to be able to indicate which detector pixel ids are markers, and extending /setup to be able to indicate the number of markers and (optionally) specify their meaning (e.g. line clock). And such an arrangement can be made to address my other concerns about marker numbering/naming (see below).

I get that this would be advantageous in the case of markers of the type 'frame', 'line' and 'pixel' for instance, although presumably they would be in the correct order in the original data, and should stay so in the HDF5 version.

No, they will not (in general) be in any particular order in the original data. But that's okay -- once it is made explicit that readers are responsible for scanning ahead to look at all elements with the same timestamp, it is no longer a problem. I think the reason I was bothered with this had more to do with the fact that the spec never mentions what happens when timestamps from different detectors collide; it wasn't initially clear to me that this was even allowed, because devices with a single TAC or TDC don't produce such data even if multiplexed; yet for markers it is crucial. (I was also operating under the general notion that formats that only allow a single representation for any given data are better, but that can be traded off.)

If I understand you correctly, you would want to have a quick way of looking only at the tags of interest.

Not particularly. My main concern is whether one can save all of the raw data at hand, even when some of it is supposedly not used and/or the interpretation for some/all of it is not (yet) available.

I'm not saying that people should in general write files that don't record which marker is what; my point is that it should be possible to do so, or else the format will be harder to use for developing new applications and troubleshooting work-in-progress systems, or to salvage data saved by misconfigured software.

This can be cleanly addressed under a scheme where markers are stored in /photon_data/timestamps if the detectors_specs has fields that indicate which detector pixel ids are markers (without saying what they mark), and separately /setup contains (optional) fields that indicate what each marker is used for.

For example, there could be detectors_specs/marker_ch1 and so on, which will typically each have a length of 1. Then, /setup could contain num_markers (required) and other optional fields that say which (if any) marker is used for what (line clock, pixel clock, etc.).

smXplorer commented 3 years ago

@marktsuchida

Another issue was with the fact that nanotimes and particles will then contain unused elements, but that is also most likely not a problem (though the spec should probably say that zeros should be stored in unused elements,

Of course nanotimes and particles are only needed if (i) the data is TCSPC or (ii) the data comes from a simulation. But if either of those conditions is true, then the timestamps corresponding to frame, line', orpixelshould be associated with _whatever_ you feel like providing asnanotimesandparticles` values (since they will be ignored, but something indeed needs to be present). That is the role of the writer software to check the validity of the generated file (e.g. phforge for standard files). I don't think we need to enforce any specific value, but zero is fine.

and also provide metadata indicating which pixel ids have valid nanotimes or particle ids).

I am not following. detector pixel ID's (aka pixel ID's) specify how to interpret each timestamp in a photon_data group. As we discussed above, the FLIM markers (line, pixel, etc.) will appear as particular pixel ID's and it will be the responsibility of the reader to figure out what to do with them. If the file contains, say, nanotimes, then timestamps associated with the line pixel ID, will need to have corresponding nanotimes (and it won't matter which value they are). In short, I believe we don't need additional metadata, but maybe a FAQ addressing this specific concern. Did I get it right?

I guess then it's a matter of extending detectors_specs to be able to indicate which detector pixel ids are markers, and extending /setup to be able to indicate the number of markers and (optionally) specify their meaning (e.g. line clock). And such an arrangement can be made to address my other concerns about marker numbering/naming (see below).

I was mulling over that after our previous exchange, trying to see whether we could dispense with defining an additional specification, but I haven't been able to come up with anything, so I would tend to agree about the need to define a new category in the setup group that advertises the presence of spatio-temporal markers, and corresponding fields in the detector_specs subgroup.

For example, there could be detectors_specs/marker_ch1 and so on, which will typically each have a length of 1. Then, /setup could contain num_markers (required) and other optional fields that say which (if any) marker is used for what (line clock, pixel clock, etc.).

I have no objection with the above suggestion of field names, but they might be either overly general (num_marker) or a tad vague (marker_ch1). Maybe num_space_time_markers in the setup group and space_time_pixel, space_time_line, space_time_frame in the detector_specs subgroup? Since I am not using this type of data, I am not sure what would work...

@talaurence maybe can chip in.

marktsuchida commented 3 years ago

@smXplorer I don't feel super strongly about zeroing and/or indicating unused elements of nanotime etc. -- the only reason I mentioned it is that whenever there is an unused field within a data format whose meaning is not defined, somebody may start using it for whatever they feel like, leading to divergent variant formats (I've worked with the TIFF format too much in my lifetime not to have this reflexive thought :). If you'd rather minimize the number of named fields in the format, in the name of simplicity, I think that's a reasonable tradeoff.

With that out of the way:

The field names that I gave as an example were really meant just as an example: I wanted to convey the basic structure, not the names. I like the names containing space_time_marker. But are you opposed to having, say, detectors_specs/space_time_ch1 (and 2, 3, ...) with /setup/space_time_pixel (and _line, _frame)? I see space_time_ch1 as being analogous to spectral_ch1 and /setup/space_time_pixel etc. as being similar in their role to /setup/detection_wavelengths (and therefore optional). Again, I don't feel strongly about exact naming, but I do about the separation of marker channel designation from marker interpretations. The current format allows having spectral channels, indicated as such, without necessarily providing their detection wavelengths. I'm just saying that it would be nice to similarly be able to have spatiotemporal marker channels, indicated as such, without necessarily providing their role/interpretation.

smXplorer commented 3 years ago

@marktsuchida

I don't feel super strongly about zeroing and/or indicating unused elements of nanotime etc. -- the only reason I mentioned it is that whenever there is an unused field within a data format whose meaning is not defined, somebody may start using it for whatever they feel like, leading to divergent variant formats (I've worked with the TIFF format too much in my lifetime not to have this reflexive thought :). If you'd rather minimize the number of named fields in the format, in the name of simplicity, I think that's a reasonable tradeoff.

The point is that zero (or any other value for that matter) is a perfectly valid timestamp (if only encountered at the beginning of a stream, if ever) and/or nanotime/particle. We could decide to use a negative number to signal a 'bogus' timestamp, but that would be enforcing a signed integer representation for timestamps. For the 'typically' unsigned fields (nanotime/particle), we don't even have this possibility, although again there is no particular requirement for the data type for these fields. Bottom line, we could enforce data types and use that to further specify "valid" versus "invalid" values, but I wonder whether this is a decision we could come to regret down the road.

As far as people trying to hijack the format to pass on proprietary or undocumented data, I am afraid that can't be prevented... Any suggestion to improve the hdf5 module of phconvert module is welcome!

The field names that I gave as an example were really meant just as an example: I wanted to convey the basic structure, not the names. I like the names containing space_time_marker. But are you opposed to having, say, detectors_specs/space_time_ch1 (and 2, 3, ...) with /setup/space_time_pixel (and _line, _frame)? I see space_time_ch1 as being analogous to spectral_ch1 and /setup/space_time_pixel etc. as being similar in their role to /setup/detection_wavelengths (and therefore optional). Again, I don't feel strongly about exact naming, but I do about the separation of marker channel designation from marker interpretations. The current format allows having spectral channels, indicated as such, without necessarily providing their detection wavelengths. I'm just saying that it would be nice to similarly be able to have spatiotemporal marker channels, indicated as such, without necessarily providing their role/interpretation.

The denomination chn made sense for physical detectors collecting photons post "filtering" by some optics part. I am not so sure it conveys any meaning in the case of "markers". In fact, it might result in confusion instead. However, to go in your direction, I suppose that we could have something like: detectors_specs/space_time_marker1 detectors_specs/space_time_marker2 etc.

in the detector_specs subgroup, together with a num_space_time_markers in the setup group.

Note that while optional, the setup fields listed as such are highly recommended for reproducibility purposes, and the same would be the case for those space_time_markers. I am not sure what else but an array of strings can be used to describe those markers, and if so, whether some universal and unambiguous list can be provided: pixel, line, frame, anything else?

marktsuchida commented 3 years ago

@smXplorer Thanks. I like the detectors_specs/space_time_marker1 (and so on) naming.

I cannot agree more that all relevant information in /setup should be written whenever available. It's just that the fact that some of them are optional helps making the format more adaptable in some situations (either when the correct information is truly not available or when the format to record the information is not defined in the spec).

Note that I'm viewing this from the perspective of writing HDF5 files using my own code, not phconvert, because in my work the data will (hopefully) go straight from the device to a Photon-HDF5 file, without using vendor file formats for temporary storage. (Also the programming language is not Python.) In this setting, if the file format allows me to store "unused" or "uninterpreted" markers, it's extremely useful for troubleshooting (either during development or when setting up a new system). If the format doesn't allow this, then I would have no choice other than to come up with my own format (or use the vendor format) to save "raw" data, which seems unfortunate. Of course, once the software is debugged and configured, the software should correctly record the correspondence between marker number and pixel/line/frame (but might still record a 4th, unused marker that may be useful in forensic examination).

In other words, I'm trying to make the point that this file format can start being useful even before one gets to the point of being able to record scientific data from successful experiments (and at least some of us spend more time building microscopes then acquiring data on them).

I'm also fine with your scheme of having a string array /setup/space_time_markers as opposed to the individual fields I proposed. I realize this is more in line with how wavelengths, etc., are recorded.

I cannot, in the context of FLIM, think of any values for the elements of that field other than pixel, line, frame, and blank (= "not used or unknown") -- at least none that seem worth standardizing at this time. FLIM does not require the scan to be a 2D raster, but I don't think it is worth coming up with standards for non-raster scanning, especially at this stage. A resonant scanner would require different considerations, but is very rare in FLIM as far as I know, and the markers would still be line and frame.

So unless other people have other ideas, I would be happy with only standardizing pixel, line, frame (all optional but at least one required for a valid FLIM dataset), and blank. Anything else could be added in the future when/if a strong argument arises for their need. (So readers should ignore markers that they don't know how to interpret.)

Moving on, since we are already talking about FLIM-specific markers, in order to completely describe a FLIM experiment there needs to be additional metadata in /setup. The important ones are

These are central to FLIM and, among other things, allow FLIM image recovery from any dataset containing at least one of pixel/line/frame markers.

Raster width and height should most certainly be required for FLIM datasets (only a data recovery tool should have to deal with datasets that don't record raster size).

The pixel period (aka pixel time, but period is less ambiguous) should be more or less required, though images can be recovered without it if the pixel marker is present. The line period is optional if the line marker is present (usually it will be) or the pixel marker is present and excess pixel markers do not occur during retrace. There could also be an optional frame period field, but I don't think there is much use for it.

Most datasets will have some redundancy of information between the marker timestamps, periods, and raster size, and there is more than one (nearly but not strictly equivalent) way to assign photons to pixels given such data. Most readers will only implement one or a few strategies, and they should simply check that the markers and parameters required for their operation are present and otherwise reject the file or try another strategy.

A couple of additional fields that are not required for constructing FLIM images but might be worth including are

smXplorer commented 3 years ago

@marktsuchida Re: In this setting, if the file format allows me to store "unused" or "uninterpreted" markers, it's extremely useful for troubleshooting (either during development or when setting up a new system). If the format doesn't allow this, then I would have no choice other than to come up with my own format (or use the vendor format) to save "raw" data, which seems unfortunate. Of course, once the software is debugged and configured, the software should correctly record the correspondence between marker number and pixel/line/frame (but might still record a 4th, unused marker that may be useful in forensic examination).

Using photon-HDF5 (ph5) doesn't preclude you from developing your alternate or intermediate format. It was never designed to be as flexible as a programming language. It is here to exchange data in a reproducible way. As a matter of fact, we are using an intermediate hdf5 format to store raw data, which phforge is supposed to feed, together with a yaml file containing a bunch of metadata, to phconvert. The intermediate file "looks" like a simplified ph5 format but is not ph5.

Re: additional /setup fields, I don't see any problems with those. However, considering that your suggestions are not conflicting with the current 0.5.dev version (which is not a full release, but is supported by phconvert, AFAIK), I would lean towards releasing 0.5 as an official version first, and then immediately adding your suggestions to a 0.6.dev version, so that others can get a chance to provide their input (and phconvert to be updated to support it). The point is that phconvert is ready now for 0.5, but may take a bit to adjust to 0.6. Once 0.6.dev is published, it can be used by people like you, even though phconvert is not necessarily ready for it.

@talaurence Any comments/suggestions?

marktsuchida commented 3 years ago

@smXplorer Sounds good; I think that is a good plan, especially seeing that /setup/detectors is new in 0.5.

I will start working on a PR so that we have something more concrete to hammer on (it will probably take me a few weeks).