Photon-HDF5 / phconvert

Convert Beker&Hickl, PicoQuant and other formats to Photon-HDF5
http://photon-hdf5.github.io/phconvert
Other
16 stars 14 forks source link

Can't read HydraHarpV2T3 header #35

Closed Tomkaehst closed 5 years ago

Tomkaehst commented 5 years ago

Hello phconvert developers,

I'm trying to read a .ptu file from a PicoQuant HydraHarp2 (record type: 16843524) using load_ptu() and get this error in _ptu_read_tag()

~/miniconda3/lib/python3.6/site-packages/phconvert/pqreader.py in _ptu_read_tag(s, offset, tag_type_r)
    658     # Some tag types have additional data
    659     if tag['type'] == 'tyAnsiString':
--> 660         tag['data'] = s[offset: offset + tag['value']].rstrip(b'\0').decode()
    661         offset += tag['value']
    662     elif tag['type'] == 'tyFloat8Array':

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 32: invalid start byte

Using the readPTU script from PicoQuants Github page I had a similar error and resolved it by changing the encoding from utf-8 to utf-16. This did not help this time.

Does anyone know what might cause the issue?

Thanks in advance, Tom

tritemio commented 5 years ago

@Tomkaehst, thanks for the report. Please provide the example data file so I can look into it.

Tomkaehst commented 5 years ago

Hi @tritemio ,

you can find an example file here: https://upload.uni-jena.de/data/5caa54291eaf20.02785906/Coumarin6_in_EtOH_2_1.ptu

In the meantime, I tried to comment out the ANSIString assignment to tag['data'] and everything now works as expected. The decoding of the "File_Comment" tag seems to be the problem.

tritemio commented 5 years ago

@Tomkaehst, right, the File_Comment contains this binary-encoded string:

b'LAS X 2.0.1.14392\r\n\r\nPinhole: 58.69 \xb5m\r\nObjective: HC FLUOTAR L 25.0 WATER\r\nImage Format: 512 x 512\r\nScan Speed: 100 Hz\r\nZoom: 1.4\r\nFrame Average: 100\r\nDirection: Unidirectional\r\n\r\nWLL\r\n LaserLine 488: 75.0\r\n Laser Shutter: Open\r\n\r\nLaser (WLL, WLL) On 70.0\r\nLaser (Argon, visible) Off 0.0\r\nLaser (IR, MP) On\r\nLaser (IR2, FSOPO) On\r\nMFP Filter: Substrate \r\nPolarization Filter: NF 488\r\nNotch Filter: Empty\r\nX1-Port: Mirror \r\nScan Mode: xyt\r\nZPosition: -1.60 \xb5m\r\nTime Cycle Count: 25 ; Cycle Time: 600.0 s ; Complete Time: 14916.0 s\r\nSpectral detection range\r\nSP PMT 1: 500...550nm \r\n\r\nFLIM Detector: Intern\r\nAcquisition Mode: Frame Repetition 100\r\n\x00'

This is not properly encoded in UTF-8. In fact, if you try to decode it as UTF8 you get the error you reported for byte 0xb5 in position 32.

The byte is printed as \xb5 in the string above and it clearly should be a μ.

We can ask python, what is the correct byte encoding for μ in UTF8:

>>> 'μ'.encode()
b'\xce\xbc'

Asking google I found this:

Unicode string:
  '\xb5'
UTF8 bytestring:
  b'\xc2\xb5'

And if I try to decode this in python:

>>> b'\xc2\xb5'.decode()
'µ'

this is a kind of slanted µ, (in the notebooks looks slanted but here on github no, so it is font-dependent).

Bottomline, I think PicoQuant here saved a broken string here... or maybe they are not using the UTF8 but some ancient encoding. Let me try, they are from Germany, so let's try latin1:

>>> print(s.rstrip(b'\0').decode('latin1'))
LAS X 2.0.1.14392

Pinhole: 58.69 µm
Objective: HC FLUOTAR L 25.0 WATER
Image Format: 512 x 512
Scan Speed: 100 Hz
Zoom: 1.4
Frame Average: 100
Direction: Unidirectional

WLL
 LaserLine 488: 75.0
 Laser Shutter: Open

Laser (WLL, WLL) On 70.0
Laser (Argon, visible) Off 0.0
Laser (IR, MP) On
Laser (IR2, FSOPO) On
MFP Filter: Substrate 
Polarization Filter: NF 488
Notch Filter: Empty
X1-Port: Mirror 
Scan Mode: xyt
ZPosition: -1.60 µm
Time Cycle Count: 25 ; Cycle Time: 600.0 s ; Complete Time: 14916.0 s
Spectral detection range
SP PMT 1: 500...550nm 

FLIM Detector: Intern
Acquisition Mode: Frame Repetition 100

Bingo, string decoded.

Bottomline: PQ uses here latin1 string encoding. I don't know if they use latin1 everywhere. Unless PQ confirms that they always use and continue to use latin1, I would put a try..except to first try UTF8 and falling back to latin1 on error.

Tomkaehst commented 5 years ago

Thank you very much for the quick response @tritemio !

tritemio commented 5 years ago

Closed by #36