cgohlke / czifile

Read Carl Zeiss(r) Image (CZI) files
https://pypi.org/project/czifile
Other
27 stars 8 forks source link

Handling escaped XML metadata #2

Closed andreasg123 closed 3 years ago

andreasg123 commented 3 years ago

A CZI file has this metadata for a subblock:

<METADATA><Tags><AcquisitionTime>2021-03-03T12:51:11.0123443Z</AcquisitionTime><ImageScaling>&lt;ImageScaling&gt;
  &lt;ImagePixelSize&gt;6.5,6.5&lt;/ImagePixelSize&gt;
&lt;/ImageScaling&gt;</ImageScaling><DetectorState>&lt;CameraState&gt;
  &lt;ApplyCameraProfile&gt;false&lt;/ApplyCameraProfile&gt;
  &lt;ApplyImageOrientation&gt;true&lt;/ApplyImageOrientation&gt;
  &lt;ExposureTime&gt;80005705.882353&lt;/ExposureTime&gt;
  &lt;Frame&gt;128,128,2048,2048&lt;/Frame&gt;
  &lt;ImageOrientation&gt;3&lt;/ImageOrientation&gt;
&lt;/CameraState&gt;</DetectorState><StageXPosition>+000000209505.8000</StageXPosition><StageYPosition>+000000045936.2000</StageYPosition><FocusPosition>+000000021667.7820</FocusPosition><RoiCenterOffsetX>+000000000000.0000</RoiCenterOffsetX><RoiCenterOffsetY>+000000000000.0000</RoiCenterOffsetY></Tags><DataSchema><ValidBitsPerPixel>16</ValidBitsPerPixel></DataSchema><AttachmentSchema /></METADATA>

Because part of the XML contents is escaped, it remains text when the metadata is converted to JSON, just as expected. Do you have suggestions for dealing with such files?

cgohlke commented 3 years ago

html.unescape might work:

from html import unescape
from json import dumps
from tifffile import xml2dict

meta = """<METADATA><Tags><AcquisitionTime>2021-03-03T12:51:11.0123443Z</AcquisitionTime><ImageScaling>&lt;ImageScaling&gt;
    &lt;ImagePixelSize&gt;6.5,6.5&lt;/ImagePixelSize&gt;
&lt;/ImageScaling&gt;</ImageScaling><DetectorState>&lt;CameraState&gt;
    &lt;ApplyCameraProfile&gt;false&lt;/ApplyCameraProfile&gt;
    &lt;ApplyImageOrientation&gt;true&lt;/ApplyImageOrientation&gt;
    &lt;ExposureTime&gt;80005705.882353&lt;/ExposureTime&gt;
    &lt;Frame&gt;128,128,2048,2048&lt;/Frame&gt;
    &lt;ImageOrientation&gt;3&lt;/ImageOrientation&gt;
&lt;/CameraState&gt;</DetectorState><StageXPosition>+000000209505.8000</StageXPosition><StageYPosition>+000000045936.2000</StageYPosition><FocusPosition>+000000021667.7820</FocusPosition><RoiCenterOffsetX>+000000000000.0000</RoiCenterOffsetX><RoiCenterOffsetY>+000000000000.0000</RoiCenterOffsetY></Tags><DataSchema><ValidBitsPerPixel>16</ValidBitsPerPixel></DataSchema><AttachmentSchema /></METADATA>"""

meta = unescape(meta)
print(meta)
meta = xml2dict(meta)
print(meta)
meta = dumps(meta)
print(meta)
andreasg123 commented 3 years ago

Thanks. That looks good. I'll check if that would cause any issues with our files.