Photon-HDF5 / photon-hdf5

Photon-HDF5 Reference Documentation
http://photon-hdf5.readthedocs.io/
3 stars 3 forks source link

Proposal for Photon-HDF5 version 0.3 #8

Closed tritemio closed 9 years ago

tritemio commented 9 years ago

This document summarizes the new layout proposed for version 0.3 of the Photon-HDF5 file format.

The format includes a mandatory set of root fields, and at least a photon-data group.

Root group

Measurement description fields

The following parameters in the root group are present in all types of measurement. They are sufficiently general to warrant their inclusion in all files. Measurement-specific information necessary to interpret the data, is provided in measurement-specific group(s) described later.

Note 1: to be decided if all these parameters must be considered mandatory.

Note 2: having these root parameters allows to support "incomplete" or "test" measurements that are not smFRET, and or ALEX. For example the multi-spot preliminary measurements with one laser excitation and one detection band. These simple "test" measurements don't need to have a separate measurement group because these root parameters are enough for interpretation. In other words we keep the simple case simple.

Measurement type

The field:

measurement_type

contains a string that identifies the measurement type. Examples of possible values are:

This group named measurements_specs is placed inside the photon_data group so the multispot case is easily supported.

"smFRET"

detectors_specs/
    donor
    acceptor

"smFRET us-ALEX"

alternation_period
alternation_period_donor
alternation_period_acceptor
detectors_specs/
    donor
    acceptor

"smFRET ns-ALEX"

laser_pulse_rate
alternation_period_donor
alternation_period_acceptor
detectors_specs/
    donor
    acceptor

"smFRET us-ALEX-3C"

alternation_period
alternation_period_blue
alternation_period_green
alternation_period_red
detectors_specs/
    blue
    green
    red

[X: Not sure BGR is the name you want to give to them. Maybe wavelength_n, n = 1, 2, 3?]

photon-data group layout:

/photon-data/
    timestamps
    timestamps_specs/
        timestamps_unit
    detectors
    nanotimes
    nanotimes_specs/
        tcspc_unit
        tcspc_range
        tcspc_num_bins

Notes

tritemio commented 9 years ago

On second thought, I prefer the following layout. The concept does not change but in this manner, we can share "logic" between the different measurement types:

/photon_data
    timestamps
    detectors
    ...
    measurement_specs/
        measurement_type: "smFRET", "smFRET-usALEX", etc...

        detectors_specs/ 
            # For all 2-color detection measurements
            donor
            acceptor

            # For all 3-color detection measurements
            blue
            green
            red

            # For measurements that record polarization
            polarization1
            polarization2

        # For us-ALEX, 2, 3 or N colors
        alex_period

        # For ns-ALEX (or lifetime with no alternation, i.e. simulations)
        laser_pulse_rate

        # For 2-color us-ALEX and ns-ALEX (optional)
        alex_period_donor
        alex_period_acceptor

        # For 3-color us-ALEX and ns-ALEX (optional)
        alex_period_blue
        alex_period_green
        alex_period_red

[X: What happens when you have spectral and polarization information?]

tritemio commented 9 years ago

To answer Xavier questions:

Q1

  • /num_spots number of parallel excitation/detection spots [X: the notion of spot might not be the most general one. What about "spatial source" instead?

"spatial source" is more opaque in the most common case. Let's call it "num_spots" and describe it as "number of parallel excitation and/or detection spots". This way is general enough. With any kind of illumination and point detectors you have "spots". With widefield detectors (H33D-like) it would be the number of pixels.

Q2

  • /alex (boolean) True (i.e. = 1) when the measurements uses any form of alternated/interleaved excitation at different wavelengths (us-ALEX, ns-ALEX or PIE, PAX) [X: what about when there are multiple excitation but not alternated? e.g. Klenerman's group]

With no alternation alex == False. The number of excitation wavelengths can be another field, currently /setup/excitation_wavelengths (list of wavelengths).

Anyway the complete description of 2 CW excitations would probably requires a dedicated entry in measurement_specs.

Q3

  • /lifetime (boolean) True (i.e. = 1) when the data contains nanotime information [X: what about when there is "virtual" information? e.g. when the timestamps are provided with enough precision and are correlated to laser excitation such that lifetime analysis is possible]

Which hardware provides it? PicoQuant and Becker&Hickl have separate field. In case of PicoQuant you can combine them to have an high resolution timestamp, but is a post processing step.

Q5

[X: what do you mean by 'named after'? as in "GW Bush was named after his father GH Bush"?]

Yes, but was the old strikeout draft, now I deleted it.

Q6

[X: Not sure BGR is the name you want to give to them. Maybe wavelength_n, n = 1, 2, 3?]

BGR was suggested by Ted. Either ways are fine for me, I would just use the most common nomenclature.

Q7

  • detectors can be a 1D array or a 2D array. Each row always represents the detector(s) of one timestamp (represented by a single ID or a n-tuple of ID). [X: what is an ID? A detector spec?]

No is one element (or one row) in the array detectors. Just a way to call a n-tuple of number that identify a detector. In the common case it will be a 1-element tuple, just an integer. For a 2-D detector it may be a pair or integers (X, Y).

Q8

[X: What happens when you have spectral and polarization information?]

What happens? Nothing, it is fine. We can also define a dedicated measurement_type string for this case. But since you can have may combinations (usALEX + polarization, nsALEX + polarization, usALEX-3C + polarization) I would lean towards not indicating the polarization in measurement_type and if the info is there it can be used.

Q9

[X: Why abbreviate polarization into "polariz"?]

+1 No abbreviation, I will replace it.

tritemio commented 9 years ago

More answers:

Q10

[X: why should they be orthogonal?]

They are in general but they shouldn't. I'll call them independent polarization.

Q11

/acquisition_time measurement duration [X: define. Is it the difference between the last and first timestamp or some different measure?]

It is a matter of implementation, it should be how long the acquisition lasts. PicoQuant and Becker&Hickl provides a field in their format conaining the "programmed" acquisition time in ms or s. In these cases I use this info when converting. Otherwise I compute last - first timestamp.

talaurence commented 9 years ago

For 3c-ALEX, let's go with 1,2, and 3 for the fluorophore labels. B,G, and R were used in the 3cALEX paper which is why I used it.

tritemio commented 9 years ago

Ok, do you mean this?

    detectors_specs/ 
        # For all 3-color detection measurements
        wavelength_1
        wavelength_2
        wavelength_3
talaurence commented 9 years ago

Yes,

That is correct.

From: Antonino Ingargiola [mailto:notifications@github.com] Sent: Tuesday, February 17, 2015 1:35 PM To: Photon-Data/photon-hdf5 Cc: Laurence, Ted A. Subject: Re: [photon-hdf5] Proposal for Photon-HDF5 version 0.3 (#8)

Ok, do you mean this?

detectors_specs/

    # For all 3-color detection measurements

    wavelength_1

    wavelength_2

    wavelength_3

— Reply to this email directly or view it on GitHubhttps://github.com/Photon-Data/photon-hdf5/issues/8#issuecomment-74758845.

tritemio commented 9 years ago

What about ditching donor and acceptor and using wavelength_N for 2, 3 and 4 colors smFRET?

talaurence commented 9 years ago

That is OK with me, although some might disagree. What does Xavier think?

From: Antonino Ingargiola [mailto:notifications@github.com] Sent: Tuesday, February 17, 2015 4:49 PM To: Photon-Data/photon-hdf5 Cc: Laurence, Ted A. Subject: Re: [photon-hdf5] Proposal for Photon-HDF5 version 0.3 (#8)

What about ditching donor and acceptor and using wavelength_N for 2, 3 and 4 colors smFRET?

— Reply to this email directly or view it on GitHubhttps://github.com/Photon-Data/photon-hdf5/issues/8#issuecomment-74788844.

tritemio commented 9 years ago

Let's summarize, this would be the full file structure so far:

# Root parameters, optional but recommended
/acquisition_time
/measurement_description

/num_detectors                    (total number of pixels)
/num_spots
/num_spectral_ch
/num_polariz_ch

/alex
/lifetime

# Mandatory
/photon_data
    # Mandatory
    timestamps
    timestamps_specs/
        timestamps_unit

    # Optional if there is only 1 detector
    detectors

    # Optional
    nanotimes
    nanotimes_specs/
        tcspc_unit
        tcspc_range
        tcspc_num_bins

    # Optional, but if present must be complete
    measurement_specs/
        measurement_type: "smFRET", "smFRET-usALEX", etc...

        detectors_specs/ 
            labels      (optional) a table with 2 columns: detector ID and detector label (string). 
                        For 2-color smFRET the labels should be "donor" and "acceptor".
                        When detector ID is a n-tuple, labels has n+1 column (n for the ID 
                        and 1 for the labels).

            # For all 2-color detection (or more) measurements
            spectral_ch1
            spectral_ch2
            ...

            # For measurements that record polarization
            polarization_ch1
            polarization_ch2

            # When the detection path is split in 2 ch. through a non-polarizing beam splitter
            split_ch1
            split_ch2

        # For us-ALEX, 2, 3 or N colors
        alex_period

        # For ns-ALEX (or lifetime with no alternation)
        laser_pulse_rate

        # For 2-color (or more) us-ALEX and ns-ALEX (optional)
        alex_period_spectral_ch1
        alex_period_spectral_ch2
        ...

# Mandatory
identity/
    filename
    full_filename
    creation_time
    software
    software_version
    format_name         ALWAYS: "Photon-HDF5"
    format_version      for now "0.3" (string)

# Optional but recommended
provenance/
    author
    affiliation
    filename
    full_filename
    creation_time
    modification_time
    software
    software_version

# Optional
setup/
    excitation_wavelengths      list of excitation wavelengths
    detection_wavelengths       list of reference wavelengths for each detection spectral band

    excitation_polarizations    list of angles for each *excitation wavelength*
    detection_polarizations     list of angles for each *detection channel*

    excitation_powers

    detection_splits_ratios     list of power fractions detected by each "split" channel 
                                (i.e. detection channels generated by beam splitting 
                                through a non-polarizing beam splitter)

# Optional
sample/
    num_dyes: (integer)         number of different dyes present in the samples. 
    dye_names (array of string) list of dye names (for example: ['ATTO550', 'ATTO647N'])
    buffer_name (string)        free-form description
    sample_name (string)        free-form description
tritemio commented 9 years ago

For wavelengths, spectral bands and dyes names the convention is to list then increasing order of reference wavelength (blue to red).

smXplorer commented 9 years ago

Comments:

smXplorer commented 9 years ago

Another thing that needs to be accounted for is the presence of non-polarizing beam splitters in the setup. In that case, two (or more) channels will have identical wavelength. The more I think of it, the more I'd like to have a nice graphical interface (with a few preset templates) to define what the setup looks like :)

tritemio commented 9 years ago

@smXplorer , point by point reply

  1. We must explicitly state in the field description that for 2-color smFRET spectral_ch1 is donor spectral_ch2 is acceptor. We have to enforce (in the reference implementation) that the convention is respected. But is kind of intuitive that spectral_ch1 < spectral_ch2 so is not adding a big burden on the user IMHO.
  2. Since the root-field is just for describing simple experiments that may not have a dedicated measurement_specs group, we can rename alex to modulated_excitation (or somehting similar) that will be True for any sort of excitation modulation/alternation and False only with CW excitation.
  3. See previous point. In this case modulated_excitation would be True and you need to also specify ameasurement_specs to completely characterize the measurement.
  4. good point, I'll add an (optional) DOI field in the identity group
tritemio commented 9 years ago

I agree that the case of "presence of non-polarizing beam splitters" is really important and must be included.

Following the same logic used for spectral and polarization channels, we can add some new detectors_specs to indicate the detectors in each beam-splitted channel. What about this names:

detectors_specs/
    split_ch1
    split_ch2
    ...

And for completeness we could add in /setup:

setup/
    detection_splits_ratios     list of power fractions detected by each "split" channel 
                                (i.e. detection channels generated by beam splitting 
                                through a non-polarizing beam splitter)
tritemio commented 9 years ago

And to complete the detectors_specs we can also add a labels field that would associate a string label to each channel. This field can store the detector labels found in .SM files or any other label (for example donor/acceptor) the user wants to set. We could also recommend assigning the labels donor and acceptor to standard 2-colors smFRET experiments.

One use case would be a comparative study of different detectors types. In this case the label can contain the detector name.

tritemio commented 9 years ago

I was thinking to moving many root parameters to setup. Other changes are:

Here is how the root and setup group may look like:

# Root parameters, optional but recommended
/acquisition_time
/measurement_description

setup/
    num_pixels
    num_spots                integer or 'none'
    num_spectral_ch
    num_polariz_ch
    num_split_ch

    modulated_excitation        True if there is any form of excitation modulation (wavelength,
                                polarization). True also for 2 or more pulse-interleaved excitation 
                                (PIE) or ns-ALEX.
    lifetime

    excitation_cw               list of booleans indicating, for each excitation wavelengths, 
                                whether the excitation is CW. 

    excitation_wavelengths      list of excitation wavelengths

    # Optional if not relevant
    detection_wavelengths       list of reference wavelengths for each detection spectral band

    excitation_polarizations    list of angles for each *excitation wavelength*
    detection_polarizations     list of angles for each *detection channel*

    excitation_powers

    detection_splits_ratios     list of power fractions detected by each "split" channel 
                                (i.e. detection channels generated by beam splitting 
                                through a non-polarizing beam splitter)
tritemio commented 9 years ago

I transferred all the information from this thread into the new version of the specifications:

http://photon-hdf5.readthedocs.org/en/version0.3/

Please comment before I give a shot at implementing the reference phconvert library.

smXplorer commented 9 years ago

How do I edit the file or comment and where? X. ----- Original Message -----

From: "Antonino Ingargiola" notifications@github.com To: "Photon-Data/photon-hdf5" photon-hdf5@noreply.github.com Cc: "smXplorer" michalet@chem.ucla.edu Sent: Monday, March 9, 2015 11:01:37 AM Subject: Re: [photon-hdf5] Proposal for Photon-HDF5 version 0.3 (#8)

I transferred all the information from this thread into the new version of the specifications: http://photon-hdf5.readthedocs.org/en/version0.3/ Please comment before I give a shot at implementing the reference phconvert library. — Reply to this email directly or view it on GitHub .


Xavier Michalet, D. Sc. Department of Chemistry and Biochemistry, UCLA Young Hall-2002, 607 Charles E. Young Drive East Los Angeles, CA 90095-1569 Ph: (310) 794-6693 (off)/6685(lab); Fax: (310) 267-4672 Email: michalet@chem.ucla.edu


talaurence commented 9 years ago

This looks excellent. Please implement.

Two questions: why do you like pytables more than h5py?

Optional group “indentity” should be “identity”.

I am implementing a scanning program. I will try to produce some files in this format in the next few months.

Is Eitan Lerner still with Shimon’s group? I would like to ask him a few questions.

Thank you,

Ted

From: Antonino Ingargiola [mailto:notifications@github.com] Sent: Monday, March 09, 2015 11:02 AM To: Photon-Data/photon-hdf5 Cc: Laurence, Ted A. Subject: Re: [photon-hdf5] Proposal for Photon-HDF5 version 0.3 (#8)

I transferred all the information from this thread into the new version of the specifications:

http://photon-hdf5.readthedocs.org/en/version0.3/

Please comment before I give a shot at implementing the reference phconvert library.

— Reply to this email directly or view it on GitHubhttps://github.com/Photon-Data/photon-hdf5/issues/8#issuecomment-77907978.

tritemio commented 9 years ago

@talaurence, typo fixed. Regarding pytables vs h5py, they are both excellent libraries. I choose pytables because it has more functionalities and tends to be more efficient. In simple cases h5py is simpler to use, but in more elaborate cases pytables gives more flexibility, it has better compression options and advanced "search" or "query" functions.

Even Pandas (a pretty famous and highly regarded python library) uses pytables as HDF5 backend (I think they switched from h5py).

tritemio commented 9 years ago

@talaurence, regarding your scanning program I hope we can come up with a robust implementation of the Photon-HDF5 read/write library that can be useful to both of us.

tritemio commented 9 years ago

I think all points here have been addressed. The latest version of the format is 0.4 to date.

Closing.