Closed tritemio closed 9 years ago
On second thought, I prefer the following layout. The concept does not change but in this manner, we can share "logic" between the different measurement types:
/photon_data
timestamps
detectors
...
measurement_specs/
measurement_type: "smFRET", "smFRET-usALEX", etc...
detectors_specs/
# For all 2-color detection measurements
donor
acceptor
# For all 3-color detection measurements
blue
green
red
# For measurements that record polarization
polarization1
polarization2
# For us-ALEX, 2, 3 or N colors
alex_period
# For ns-ALEX (or lifetime with no alternation, i.e. simulations)
laser_pulse_rate
# For 2-color us-ALEX and ns-ALEX (optional)
alex_period_donor
alex_period_acceptor
# For 3-color us-ALEX and ns-ALEX (optional)
alex_period_blue
alex_period_green
alex_period_red
[X: What happens when you have spectral and polarization information?]
To answer Xavier questions:
/num_spots
number of parallel excitation/detection spots [X: the notion of spot might not be the most general one. What about "spatial source" instead?
"spatial source" is more opaque in the most common case. Let's call it "num_spots" and describe it as "number of parallel excitation and/or detection spots". This way is general enough. With any kind of illumination and point detectors you have "spots". With widefield detectors (H33D-like) it would be the number of pixels.
/alex
(boolean) True (i.e. = 1) when the measurements uses any form of alternated/interleaved excitation at different wavelengths (us-ALEX, ns-ALEX or PIE, PAX) [X: what about when there are multiple excitation but not alternated? e.g. Klenerman's group]
With no alternation alex == False
. The number of excitation wavelengths can be another field, currently /setup/excitation_wavelengths
(list of wavelengths).
Anyway the complete description of 2 CW excitations would probably requires a dedicated entry in measurement_specs
.
/lifetime
(boolean) True (i.e. = 1) when the data contains nanotime information [X: what about when there is "virtual" information? e.g. when the timestamps are provided with enough precision and are correlated to laser excitation such that lifetime analysis is possible]
Which hardware provides it? PicoQuant and Becker&Hickl have separate field. In case of PicoQuant you can combine them to have an high resolution timestamp, but is a post processing step.
[X: what do you mean by 'named after'? as in "GW Bush was named after his father GH Bush"?]
Yes, but was the old strikeout draft, now I deleted it.
[X: Not sure BGR is the name you want to give to them. Maybe wavelength_n, n = 1, 2, 3?]
BGR was suggested by Ted. Either ways are fine for me, I would just use the most common nomenclature.
detectors
can be a 1D array or a 2D array. Each row always represents the detector(s) of one timestamp (represented by a single ID or a n-tuple of ID). [X: what is an ID? A detector spec?]
No is one element (or one row) in the array detectors
. Just a way to call a n-tuple of number that identify a detector. In the common case it will be a 1-element tuple, just an integer. For a 2-D detector it may be a pair or integers (X, Y).
[X: What happens when you have spectral and polarization information?]
What happens? Nothing, it is fine. We can also define a dedicated measurement_type
string for this case. But since you can have may combinations (usALEX + polarization, nsALEX + polarization, usALEX-3C + polarization) I would lean towards not indicating the polarization in measurement_type
and if the info is there it can be used.
[X: Why abbreviate polarization into "polariz"?]
+1 No abbreviation, I will replace it.
More answers:
[X: why should they be orthogonal?]
They are in general but they shouldn't. I'll call them independent polarization.
/acquisition_time
measurement duration [X: define. Is it the difference between the last and first timestamp or some different measure?]
It is a matter of implementation, it should be how long the acquisition lasts. PicoQuant and Becker&Hickl provides a field in their format conaining the "programmed" acquisition time in ms or s. In these cases I use this info when converting. Otherwise I compute last - first timestamp.
For 3c-ALEX, let's go with 1,2, and 3 for the fluorophore labels. B,G, and R were used in the 3cALEX paper which is why I used it.
Ok, do you mean this?
detectors_specs/
# For all 3-color detection measurements
wavelength_1
wavelength_2
wavelength_3
Yes,
That is correct.
From: Antonino Ingargiola [mailto:notifications@github.com] Sent: Tuesday, February 17, 2015 1:35 PM To: Photon-Data/photon-hdf5 Cc: Laurence, Ted A. Subject: Re: [photon-hdf5] Proposal for Photon-HDF5 version 0.3 (#8)
Ok, do you mean this?
detectors_specs/
# For all 3-color detection measurements
wavelength_1
wavelength_2
wavelength_3
— Reply to this email directly or view it on GitHubhttps://github.com/Photon-Data/photon-hdf5/issues/8#issuecomment-74758845.
What about ditching donor
and acceptor
and using wavelength_N
for 2, 3 and 4 colors smFRET?
That is OK with me, although some might disagree. What does Xavier think?
From: Antonino Ingargiola [mailto:notifications@github.com] Sent: Tuesday, February 17, 2015 4:49 PM To: Photon-Data/photon-hdf5 Cc: Laurence, Ted A. Subject: Re: [photon-hdf5] Proposal for Photon-HDF5 version 0.3 (#8)
What about ditching donor and acceptor and using wavelength_N for 2, 3 and 4 colors smFRET?
— Reply to this email directly or view it on GitHubhttps://github.com/Photon-Data/photon-hdf5/issues/8#issuecomment-74788844.
Let's summarize, this would be the full file structure so far:
# Root parameters, optional but recommended
/acquisition_time
/measurement_description
/num_detectors (total number of pixels)
/num_spots
/num_spectral_ch
/num_polariz_ch
/alex
/lifetime
# Mandatory
/photon_data
# Mandatory
timestamps
timestamps_specs/
timestamps_unit
# Optional if there is only 1 detector
detectors
# Optional
nanotimes
nanotimes_specs/
tcspc_unit
tcspc_range
tcspc_num_bins
# Optional, but if present must be complete
measurement_specs/
measurement_type: "smFRET", "smFRET-usALEX", etc...
detectors_specs/
labels (optional) a table with 2 columns: detector ID and detector label (string).
For 2-color smFRET the labels should be "donor" and "acceptor".
When detector ID is a n-tuple, labels has n+1 column (n for the ID
and 1 for the labels).
# For all 2-color detection (or more) measurements
spectral_ch1
spectral_ch2
...
# For measurements that record polarization
polarization_ch1
polarization_ch2
# When the detection path is split in 2 ch. through a non-polarizing beam splitter
split_ch1
split_ch2
# For us-ALEX, 2, 3 or N colors
alex_period
# For ns-ALEX (or lifetime with no alternation)
laser_pulse_rate
# For 2-color (or more) us-ALEX and ns-ALEX (optional)
alex_period_spectral_ch1
alex_period_spectral_ch2
...
# Mandatory
identity/
filename
full_filename
creation_time
software
software_version
format_name ALWAYS: "Photon-HDF5"
format_version for now "0.3" (string)
# Optional but recommended
provenance/
author
affiliation
filename
full_filename
creation_time
modification_time
software
software_version
# Optional
setup/
excitation_wavelengths list of excitation wavelengths
detection_wavelengths list of reference wavelengths for each detection spectral band
excitation_polarizations list of angles for each *excitation wavelength*
detection_polarizations list of angles for each *detection channel*
excitation_powers
detection_splits_ratios list of power fractions detected by each "split" channel
(i.e. detection channels generated by beam splitting
through a non-polarizing beam splitter)
# Optional
sample/
num_dyes: (integer) number of different dyes present in the samples.
dye_names (array of string) list of dye names (for example: ['ATTO550', 'ATTO647N'])
buffer_name (string) free-form description
sample_name (string) free-form description
For wavelengths, spectral bands and dyes names the convention is to list then increasing order of reference wavelength (blue to red).
Comments:
Another thing that needs to be accounted for is the presence of non-polarizing beam splitters in the setup. In that case, two (or more) channels will have identical wavelength. The more I think of it, the more I'd like to have a nice graphical interface (with a few preset templates) to define what the setup looks like :)
@smXplorer , point by point reply
spectral_ch1
is donor spectral_ch2
is acceptor. We have to enforce (in the reference implementation) that the convention is respected. But is kind of intuitive that spectral_ch1
< spectral_ch2
so is not adding a big burden on the user IMHO.measurement_specs
group, we can rename alex
to modulated_excitation
(or somehting similar) that will be True for any sort of excitation modulation/alternation and False only with CW excitation.modulated_excitation
would be True and you need to also specify ameasurement_specs
to completely characterize the measurement.identity
groupI agree that the case of "presence of non-polarizing beam splitters" is really important and must be included.
Following the same logic used for spectral and polarization channels, we can add some new detectors_specs
to indicate the detectors in each beam-splitted channel. What about this names:
detectors_specs/
split_ch1
split_ch2
...
And for completeness we could add in /setup
:
setup/
detection_splits_ratios list of power fractions detected by each "split" channel
(i.e. detection channels generated by beam splitting
through a non-polarizing beam splitter)
And to complete the detectors_specs
we can also add a labels
field that would associate a string label to each channel. This field can store the detector labels found in .SM files or any other label (for example donor/acceptor) the user wants to set. We could also recommend assigning the labels donor
and acceptor
to standard 2-colors smFRET experiments.
One use case would be a comparative study of different detectors types. In this case the label can contain the detector name.
I was thinking to moving many root parameters to setup
. Other changes are:
alex
to modulated_excitation
num_split_ch
excitation_cw
Here is how the root and setup group may look like:
# Root parameters, optional but recommended
/acquisition_time
/measurement_description
setup/
num_pixels
num_spots integer or 'none'
num_spectral_ch
num_polariz_ch
num_split_ch
modulated_excitation True if there is any form of excitation modulation (wavelength,
polarization). True also for 2 or more pulse-interleaved excitation
(PIE) or ns-ALEX.
lifetime
excitation_cw list of booleans indicating, for each excitation wavelengths,
whether the excitation is CW.
excitation_wavelengths list of excitation wavelengths
# Optional if not relevant
detection_wavelengths list of reference wavelengths for each detection spectral band
excitation_polarizations list of angles for each *excitation wavelength*
detection_polarizations list of angles for each *detection channel*
excitation_powers
detection_splits_ratios list of power fractions detected by each "split" channel
(i.e. detection channels generated by beam splitting
through a non-polarizing beam splitter)
I transferred all the information from this thread into the new version of the specifications:
http://photon-hdf5.readthedocs.org/en/version0.3/
Please comment before I give a shot at implementing the reference phconvert
library.
How do I edit the file or comment and where? X. ----- Original Message -----
From: "Antonino Ingargiola" notifications@github.com To: "Photon-Data/photon-hdf5" photon-hdf5@noreply.github.com Cc: "smXplorer" michalet@chem.ucla.edu Sent: Monday, March 9, 2015 11:01:37 AM Subject: Re: [photon-hdf5] Proposal for Photon-HDF5 version 0.3 (#8)
I transferred all the information from this thread into the new version of the specifications: http://photon-hdf5.readthedocs.org/en/version0.3/ Please comment before I give a shot at implementing the reference phconvert library. — Reply to this email directly or view it on GitHub .
Xavier Michalet, D. Sc. Department of Chemistry and Biochemistry, UCLA Young Hall-2002, 607 Charles E. Young Drive East Los Angeles, CA 90095-1569 Ph: (310) 794-6693 (off)/6685(lab); Fax: (310) 267-4672 Email: michalet@chem.ucla.edu
This looks excellent. Please implement.
Two questions: why do you like pytables more than h5py?
Optional group “indentity” should be “identity”.
I am implementing a scanning program. I will try to produce some files in this format in the next few months.
Is Eitan Lerner still with Shimon’s group? I would like to ask him a few questions.
Thank you,
Ted
From: Antonino Ingargiola [mailto:notifications@github.com] Sent: Monday, March 09, 2015 11:02 AM To: Photon-Data/photon-hdf5 Cc: Laurence, Ted A. Subject: Re: [photon-hdf5] Proposal for Photon-HDF5 version 0.3 (#8)
I transferred all the information from this thread into the new version of the specifications:
http://photon-hdf5.readthedocs.org/en/version0.3/
Please comment before I give a shot at implementing the reference phconvert library.
— Reply to this email directly or view it on GitHubhttps://github.com/Photon-Data/photon-hdf5/issues/8#issuecomment-77907978.
@talaurence, typo fixed. Regarding pytables vs h5py, they are both excellent libraries. I choose pytables because it has more functionalities and tends to be more efficient. In simple cases h5py is simpler to use, but in more elaborate cases pytables gives more flexibility, it has better compression options and advanced "search" or "query" functions.
Even Pandas (a pretty famous and highly regarded python library) uses pytables as HDF5 backend (I think they switched from h5py).
@talaurence, regarding your scanning program I hope we can come up with a robust implementation of the Photon-HDF5 read/write library that can be useful to both of us.
I think all points here have been addressed. The latest version of the format is 0.4 to date.
Closing.
This document summarizes the new layout proposed for version 0.3 of the Photon-HDF5 file format.
The format includes a mandatory set of root fields, and at least a photon-data group.
Root group
Measurement description fields
The following parameters in the root group are present in all types of measurement. They are sufficiently general to warrant their inclusion in all files. Measurement-specific information necessary to interpret the data, is provided in measurement-specific group(s) described later.
/num_spots
number of parallel excitation and/or detection spots. When using point-like detectors, this number indicates the number of detection PSFs, even if the excitation does not involves "spots".With a widefield detector this field looses any meaning and its value should become 'undefined', or the field should be removed altogether.
/alex
(boolean) True (i.e. = 1) when the measurements uses any form of alternated/interleaved excitation at different wavelengths (us-ALEX, ns-ALEX or PIE, PAX). When there is no periodic alternation of the excitation wavelength this field must beFalse
./lifetime
(boolean) True (i.e. = 1) when the data contains explicitly a nanotime array. In the case of high-resolution timestamps with potential lifetime resolving resolution this filed will still beFalse
./num_spectral_ch
(integer) number of different spectral bands in the detection channels (e.g. 2 for 2-color smFRET)/num_polariz_ch
(integer) number of different polarizations in the detection channels. The value is 1 if no polarization selection is performed and 2 if two polarizations are recorded./acquisition_time
measurement duration [X: define. Is it the difference between the last and first timestamp or some different measure?]/measurement_description
(string): free-form user-supplied description of the measurement.Note 1: to be decided if all these parameters must be considered mandatory.
Note 2: having these root parameters allows to support "incomplete" or "test" measurements that are not smFRET, and or ALEX. For example the multi-spot preliminary measurements with one laser excitation and one detection band. These simple "test" measurements don't need to have a separate measurement group because these root parameters are enough for interpretation. In other words we keep the simple case simple.
Measurement type
The field:
contains a string that identifies the measurement type. Examples of possible values are:
Measurement group:
This group named
measurements_specs
is placed inside thephoton_data
group so the multispot case is easily supported."smFRET"
"smFRET us-ALEX"
"smFRET ns-ALEX"
"smFRET us-ALEX-3C"
[X: Not sure BGR is the name you want to give to them. Maybe wavelength_n, n = 1, 2, 3?]
photon-data group layout:
Notes
detectors
can be a 1D array or a 2D array. Each row always represents the detector(s) of one timestamp (represented by a single ID or a n-tuple of ID).detectors_specs
group is now inmeasurement_specs
and defines the associations between detectors and spectral bands or polarization bands.