tritemio commented 8 years ago

This is a proposal for a new addition to the Photon-HDF5 format (Photon-HDF5 Enhancement Proposal, PEP) in order to record detector-specific information which is currently (<=0.4) not well supported.

Anyone interested in Photon-HDF5 is encouraged to participate to this discussion. We especially seek comments from users outside the Photon-HDF5 organization (which currently is me, @smXplorer and @talaurence). We aim to reach a consensus and release the new version as Photon-HDF5 version 0.5 in the following months.

Summary

Currently, there is no way to save pixel or detector-specific information in a Photon-HDF5 file (except when using a custom user group). For example, we would want to save detector dark counting rate (DCR), afterpulsing probability and other detector characteristics. Additionally, we have several open issues which can be addressed by adding detector-specific data as discussed below.

In brief, the proposal is to add a set of groups (one for each detector/pixel) in the setup group, which can contain all the needed detector-specific data.

Addressing previous issues

Here I list all open issues that can be solved by the addition of detectorN groups.

Issue #30 proposes to add the number of counts (i.e. number of timestamps) present for each detector. This info can be saved as a counts field in the detector-specific group.

Issue #5 deals with adding a human-readable label for each detector. This can also help to disambiguate situations where there are more ??? than the standard detectors (see Issue #15). We can add a label field to the detector-specific group.

Issues specific to SPAD arrays

Issue #31 asks whether is it possible to save "dead pixel" information, a situation encountered when one or more pixels of a SPAD array stops working. This information could be saved as a detector with counts = 0.

Issue #18 proposes to save the module each pixel belongs to and the "position" of the pixel on the chip. Similarly issue #1 propose to add (x,y) pixel coordinates in an array. These info can be saved as module and position (or something similar) in the detector-specific group.

Proposal

I propose to add a new (optional) set of groups (one for each detector) in the setup group. These can be called detector0, detector1, ... detectorN. Inside each of these groups will be detector-specific fields. All fields will be optional and only added when needed (or when the user who saves the file wants to save this extra information).

The current proposal is:

setup/
    detector0/
        label (string): a human-readable label for the detector
        id (int): number used by the acquisition hadware to identify the pixel.
        counts (int): number of timestamps counted by detector0
        module (string): name of the module the pixel belongs to [multispot]
        position (array of int): x,y position of the pixel in the array [multispot]
        dcr (float): dark counting rate in Hz for the pixel
        afterpulsing (float): afterpulsing probability for the pixel
        spot (int): the spot number this pixel is used in [multispot]
    detector1/
    ...
    detectorN/
    ...

Backward compatibility

This proposal preserves backward compatibility: it only adds new optional fields. Software that reads version Photon-HDF5 0.4 will be able to read Photon-HDF5 0.5, it will simply ignore the new fields. New software supporting reading version 0.5 should check if these new fields exists and load the information therein if needed.

smXplorer commented 8 years ago

The last paragraph doesn't make sense to me. In our discussion, the idea was that if per-pixel detector specs were needed, for instance TCSPC information, they would be found in the detectorn groups, but not in the nanotimes_specs subgroup. So, if a v0.4-compatible software tries to read the new files, it won't find a nanotimes_specs subgroup and will be missing information to successfully load the file.
Another information that could be put in this new detector group is the number of particles.
What would be the aferpulsing probability range: 0-1 or 0-100?

tritemio commented 8 years ago

Let's leave the nanotimes_specs issue out here. We can open a separate issue for that.

Q2, I don't understand what do you mean?

Q3, I would say 0...1 range.

smXplorer commented 8 years ago

Re: 2. for simulations, the number of particles is currently per spot, but it might be interesting to differentiate between particles seen by one pixel but not by the other (e.g. if the particles are either red or green).

tritemio commented 8 years ago

One issue needs to be addressed. Writing it down before I forget.

We have a group named detector[i] which contains a field called id. As it is in current proposal id is the original number used fo the detector by the acquisition hardware. For example in a single-spot smFRET acquisition we could have detector 4 and 6. On the other hand, the detectors groups are numbered progressively without gaps to make it easier for a reader.

Question is, what do we put in the detectors array in photon_data?

A. We can save the original detector ID from the acquisition hardware

B. We rename each detector number to the corresponding [i] used in the detector group.

The former will make it simpler to save data (does not require the detector mapping step). The latter will make it easier, in reading the file, when starting from detectors array to find which detector group is associated.

For option A, however, there is no big issue. A reader can read all the detector groups first and build the association using the id field (and the spot field for multispot), then read the detector array (and this would only be required when a user wants to read this extra detector information).

Now that I wrote down the two option I lean toward option A.

smXplorer commented 8 years ago

If the field is an ID, it doesn't have to follow any particular order. However, each ID needs to be unique. Internally, ALiX builds an ID <-> index map and adds a spot ID to each pixel ID, because the pixel IDs are NOT unique. It is manageable, but it is an added level of complexity that could be avoided by having one unique ID per pixel.

tritemio commented 8 years ago

An alternative layout is using single detectors group and an array for each field. The length of the arrays is the number of detectors, and the index in the array is the detector number.

setup/
    detectors/
        label (array of string): a human-readable label for the detector
        id (array of int): number used by the acquisition hardware to identify the pixel.
        counts (array of int): number of timestamps counted by each detector
        module (array of string): name of the module the pixel belongs to [multispot]
        position (2-D array of int): columns are x,y positions of each pixel in the array [multispot]
        dcr (array of float): dark counting rate in Hz for the pixel
        afterpulsing (array of float): afterpulsing probability for the pixel
        spot (array of int): the spot number this pixel is used in [multispot]

This is simpler and scales better to high number of detectors.

tritemio commented 8 years ago

Additional thoughts. The common use case is going from the detectors array in photon_data to the metadata for the detector in /setup/detectors.

From this point of view, it would be easier to have unique (and sequential, i.e. no gaps) detectors ID in the detectors array. The detector ID is also used as index in /setup/detectors to find relevant metadata. The original detector number (from the acquisition hardware) can be preserved in /setup/detectors/id.

This is basically option B above. The good is:

simple structure
easy to read
channels can be merged keeping detectors distinct

Drawbacks are:

possibly a bit less compression
some additional processing when saving (one-time processing) to remove duplicates or gaps.
compatibility: an old 1-spot data file with let's say, 4 and 6 in detectors becomes incompatible?

Compatibility

We can handle the compatibility as follows.

Compat A If /setup/detectors is not present nothing changes. If it is present then detectors IDs in photon_data/detectors need to be unique and sequential.

Compat B We add a new field measurement_specs/detectors_specs/det_id_map which contains the mapping between the detector ID used in detectors array and the "index" used in /setup/detectors/.

tritemio commented 7 years ago

Having unique detectors IDs and no gaps in numbering is too limiting.

Sometimes detectors ID may be non sequential (e.g. 4 and 6). Simply preserving the original detector number is more transparent.
Another issue is that in multispot measurements unique detector IDs cause:

detector_specs that are different for each spot (preventing copying the same measurements_specs in all the spots when writing)
for more than 256 pixels the detectors array needs to use 2 bytes instead of 1.

These issues can be avoided with an unique detector identifier made of the (id, spot) fields in /setup/detectors. Values in /photon_dataN/detectors can be unique across the spots but they don't have to. For example donor/acceptor detectors can be 0,1 for all the spots. From the point of view of the analysis there is no need of having unique detector identifiers in /photon_dataN/detectors, the processing is always done spot by spot. The only concern of this approach would preserving the id assigned to each detector by the acquisition hardware. We can save this information in a new field id_hardware in /setup/detectors/.

To find a detector in /setup/detectors/ starting from a value in /photon_dataN/detectors we simply find the matching /setup/detectors/id (in a given spot, if multi-spot). This index can be used to retrieve all the other info for this specific detector.

To go from /setup/detectors/ to the relative spot is easy instead. We just read /setup/detectors/spots to find the spot where the detector appears.

The updated /setup/detectors/ is

setup/
    detectors/
        label (array of string): *optional*, a human-readable label for the detector
        id (array of int): *required*, the numeric ID of each detector as contained in `/photon_data[N]/detectors`
        id_hardware (array of int): *optional*, the original ID assigned by the acquisition hardware to each detector
        counts (array of int): *optional*, number of timestamps counted by each detector
        module (array of string): *optional* name of the module the pixel belongs to [multispot-only]
        position (2-D array of int): *optional* columns are x,y positions of each pixel in the array [multispot-only]
        dcr (array of float): *optional* dark counting rate in Hz for the pixel
        afterpulsing (array of float): *optional* afterpulsing probability for the pixel
        spot (array of int):the spot number this pixel is used in [multispot-only]

tritemio commented 7 years ago

The new /setup/detectors group is now documented in the 0.5 specification draft:

http://photon-hdf5.readthedocs.io/en/0.5.dev/phdata.html#detectors-group

and implemented in phconvert dev branch.

tritemio commented 7 years ago

[x] Add /setup/detectors documentation
[x] Implement /setup/detectors in phconvert

Photon-HDF5 / photon-hdf5

PEP: Add per-pixel information #33

Summary

Addressing previous issues

Issues specific to SPAD arrays

Proposal

Backward compatibility

Compatibility