OpenSMFS / FRETBursts

Burst analysis software for single and multi-spot single-molecule FRET (smFRET) data.
https://opensmfs.github.io/FRETBursts/
GNU General Public License v2.0
12 stars 9 forks source link

nsALEX bugs #12

Closed AndersBarth closed 6 years ago

AndersBarth commented 6 years ago

Dear FRETBurst team,

I ran into some bugs when I tried to use the software to analyse nsALEX/PIE data.

Initially, I tried to load MFD-PIE data with polarization, but ran into an issue on lines 234-235 in loader.py.

https://github.com/OpenSMFS/FRETBursts/blob/c2c284d74c34e5c69b6ce51f591928da3e40660c/fretbursts/loader.py#L232-L237

The variables spectral_ch1 and spectral_ch1 both contain two values for the data, belonging to the two polarizations, which causes the call to np.asscalar to fail.

I then processed my data by combining the two polarizations to only have a single channel per color, but it still errored. I fixed it by changing the lines to:

donor = meas_specs.detectors_specs.spectral_ch1.read()[0]
accept = meas_specs.detectors_specs.spectral_ch2.read()[0]

This will, however, not take care of situations were two channels are used as in polarized detection. How could this situation be fixed?

Next, I ran into an issue when calling nsalex_apply_period in lines 553-560.

https://github.com/OpenSMFS/FRETBursts/blob/c2c284d74c34e5c69b6ce51f591928da3e40660c/fretbursts/loader.py#L553-L560

The arrays d_ex_mask_t and a_ex_mask_t are initialized to size (N,), but the logical array that is added to them seems to be of size (N,1).

I could fix the error by replacing lines 554 and 558 with:

d_ex_mask_t = np.zeros((d.nanotimes_t[ich].size,1), dtype='bool')

a_ex_mask_t = np.zeros((d.nanotimes_t[ich].size,1), dtype='bool')

Please let me know if this solution is acceptable.

Best, Anders

tritemio commented 6 years ago

@AndersBarth, thanks for your reports.

We never used PIE files with polarization with FRETBursts so these errors are kind of "expected". But, we will be certainly happy to add support for PIE+polarization data.

Regarding the first question. The workaround you used should work, but as you said you are neglecting the second polarization. The real fix would be extending the loading logic to handle additional polarization information. It is correct to have two values in spectral_ch1/2. You should also have 2 values in polarization_ch1/2. With these 4 fields you can load the data properly. Currently, FRETBursts distinguishes between D and A detected photons using boolean masks Data.Dem and Data.Aem. These masks are complementary and indicate whether the each photon is detected by D or A channel. The natural way to support polarization would be to add other two masks, for example Data.Pem and Data.Sem, to store if each photon is detected by the P or S polarization channel (regardless whether it is a D or A channel). With this 4 masks, you can get any arbitrary photon selection. For example D-photons with S-polarization would be selected with Dem[0]*Sem[0] (the zero index is a FRETBursts artifact due to supporting multispot data). Does it sound like a good plan?

Regarding the second error. I am surprised by this one. Seems like the array d.nanotimes_t[ich] has an additional dimension of size 1 (i.e. it is of shape (n, 1) instead of (n,)). Can you print d.nanotimes_t[ich].shape to confirm? If this is the case it is probably an issue with the input file. Can you also print what's in the nanotime array in HDF5 file? Can you attach the data file here (drag and drop on the comment box)?

import tables

h5file = tables.open_file(filename)
print(h5file.root.photon_data.nanotimes)
AndersBarth commented 6 years ago

The second error was indeed a problem in how I was generating the file from MATLAB.

I called h5create using the size of the array, when I should have just given it the number of elements as integer.

wrong: h5create('temp/photon_data.h5', '/timestamps', size(timestamps), 'Datatype', 'int64') correct: h5create('temp/photon_data.h5', '/timestamps', numel(timestamps), 'Datatype', 'int64')

Would it be possible to add verification for this in the phforge script to avoid this error?

AndersBarth commented 6 years ago

Output from your code example now reads:

/photon_data/nanotimes (CArray(1506894,), shuffle, zlib(6)) b'TCSPC photon arrival time (nanotimes). Units and other specifications are in nanotimes_specs group.'

AndersBarth commented 6 years ago

With that, I think the issue is addressed.

One suggestion would be, however, to catch the case where spectral_ch1/2 have multiple elements as in my case and provide a message for the user and avoid the error.

tritemio commented 6 years ago

Absolutely, I will put more checks in place. It's always good to hear from new use cases.

For the record, which version of phconvert are you using? I strongly recommend using phconvert 0.8 which support Photon-HDF5 v0.5. With Photon-HDF5 0.5 a file with polarization is of measurement_type = 'generic' and the fields in /setup will describe the type of measurement. The latest phconvert also decodes more input data files, so if you are using Becker&Hickl or Picoquant files, they can be converted to Photon-HDF5 file directly.

I am preparing new releases of phconvert and FRETBursts, so it's a good time to sneak in fixes and more checks.

AndersBarth commented 6 years ago

I am running phconvert 0.8. Thanks for the information.

The latest phconvert also decodes more input data files, so if you are using Becker&Hickl or Picoquant files, they can be converted to Photon-HDF5 file directly.

That is a really nice addition! Very appreciated. For B&H data, does it support measurements done using multiple cards, i.e. can you specify multiple input files?

tritemio commented 6 years ago

Good to hear you are using a recent phconvert. I just released phconvert 0.8.1 that, among other things, has the additional check for the shape of photon_data arrays you suggested. See phconvert release notes.

To create a Photon-HDF5 with phconvert, you first load the data, usually decoding the input file using routine provided by phconvert itself. Then, you add some metadata and save the file using phconvert.hdf5.save_photon_hdf5. There are several examples as notebooks in the phconvert repository. If you like the option of writing metadata to a YAML file there is a notebook example also for that (the YAML approach make it easier to convert in batch).

If you have many input files, for example for different detectors, you have to load the files, merge the timestamps and then save the Photon-HDF5 as usual. For "merging" of timestamps you can do something like:

https://github.com/multispot-software/niconverter/blob/a1852043e1b41fde721143bfbae40f32c4f87fc9/niconverter/niconverter.py#L458-L464

Let me know if you have any issue with the above instructions.

BTW, it would be great to add such an example to phconvert notebooks.

tritemio commented 6 years ago

No reply, closing.