IDR / bioformats

Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment (particularly UW-Madison LOCI and Glencoe Software). Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
http://www.openmicroscopy.org/site/products/bio-formats
GNU General Public License v2.0
0 stars 0 forks source link

ScreenReader: invalid XML character when writing OME metadata #29

Closed sbesson closed 1 year ago

sbesson commented 1 year ago

This issue was exposed by the ongoing investigation about converting IDR studies and more specifically the work in https://github.com/IDR/bioformats2raw/pull/1

The converter failed when writing out the OME-XML under OME/METADATA.ome.xml with an error of type

$ ./bioformats2raw-0.7.0-SNAPSHOT/bin/bioformats2raw /uod/idr/metadata/idr0035-caie-drugresponse/screens/Week10_40111.screen /tmp/Week10_40111.ome.zarr
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp7079964177449063336/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
[Fatal Error] :1:84: Character reference "&#0" is an invalid XML character.
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@63a65a25): java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 84; Character reference "&#0" is an invalid XML character.
    at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
    at picocli.CommandLine.access$1300(CommandLine.java:145)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
    at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
    at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
    at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
    at picocli.CommandLine.call(CommandLine.java:2761)
    at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:2192)
Caused by: java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 84; Character reference "&#0" is an invalid XML character.
    at ome.xml.model.XMLAnnotation.asXMLElement(XMLAnnotation.java:263)
    at ome.xml.model.StructuredAnnotations.asXMLElement(StructuredAnnotations.java:681)
    at ome.xml.model.OME.asXMLElement(OME.java:931)
    at ome.xml.model.OME.asXMLElement(OME.java:771)
    at ome.xml.meta.AbstractOMEXMLMetadata.dumpXML(AbstractOMEXMLMetadata.java:110)
    at ome.xml.meta.OMEXMLMetadataImpl.dumpXML(OMEXMLMetadataImpl.java:105)
    at loci.formats.ome.OMEPyramidStore.dumpXML(OMEPyramidStore.java:81)
    at loci.formats.services.OMEXMLServiceImpl.getOMEXML(OMEXMLServiceImpl.java:469)
    at com.glencoesoftware.bioformats2raw.Converter.convert(Converter.java:663)
    at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:516)
    at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:107)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
    ... 9 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 84; Character reference "&#0" is an invalid XML character.
    at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at ome.xml.model.XMLAnnotation.asXMLElement(XMLAnnotation.java:259)
    ... 20 more

The same issue can be reproduced outside the context of the converter utility by using the command-line showing utility (with the IDR bioformats_package) with the -omexml option which also attempts to print the OME-XML to stdout:

$ ./bftools/showinf -nopix -omexml /uod/idr/metadata/idr0035-caie-drugresponse/screens/Week10_40111.screen
Checking file format [Screen]
Initializing reader
ScreenReader initializing /uod/idr/metadata/idr0035-caie-drugresponse/screens/Week10_40111.screen
MetamorphReader initializing /uod/idr/filesets/idr0035-caie-drugresponse/images/Week10_40111/Week10_200907_B02_s1_w18E215662-2CF7-4739-93F3-DBD0C40B78DB.tif
Reading IFDs
...
[Fatal Error] :1:84: Character reference "&#0" is an invalid XML character.
Exception in thread "main" java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 84; Character reference "&#0" is an invalid XML character.
    at ome.xml.model.XMLAnnotation.asXMLElement(XMLAnnotation.java:263)
    at ome.xml.model.StructuredAnnotations.asXMLElement(StructuredAnnotations.java:681)
    at ome.xml.model.OME.asXMLElement(OME.java:931)
    at ome.xml.model.OME.asXMLElement(OME.java:771)
    at ome.xml.meta.AbstractOMEXMLMetadata.dumpXML(AbstractOMEXMLMetadata.java:110)
    at ome.xml.meta.OMEXMLMetadataImpl.dumpXML(OMEXMLMetadataImpl.java:105)
    at loci.formats.ome.OMEPyramidStore.dumpXML(OMEPyramidStore.java:81)
    at loci.formats.services.OMEXMLServiceImpl.getOMEXML(OMEXMLServiceImpl.java:469)
    at loci.formats.FormatReader.setId(FormatReader.java:1422)
    at loci.formats.ImageReader.setId(ImageReader.java:849)
    at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:650)
    at loci.formats.tools.ImageInfo.testRead(ImageInfo.java:1035)
    at loci.formats.tools.ImageInfo.main(ImageInfo.java:1121)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 84; Character reference "&#0" is an invalid XML character.
    at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at ome.xml.model.XMLAnnotation.asXMLElement(XMLAnnotation.java:259)
    ... 12 more
sbesson commented 1 year ago

Some update on this investigation. Using a representative TIFF file from https://idr.openmicroscopy.org/webclient/?show=plate-6301, I can reproduce the issue using the following minimal screen file

[Plate]
Name = Week10_40111
Rows = 1
Columns = 1
Fields = 1

[Well 0]
Row = 0
Column = 0
Field_0 = Week10_200907_B02_s1_w18E215662-2CF7-4739-93F3-DBD0C40B78DB.tif

This will throw a org.xml.sax.SAXParseException when running showinf -nopix -omexml. Passing -no-sas is sufficient to avoid the failure.

I cannot reproduce the issue while using either a fake file or a TIFF file converted from a fake file (and including some original metadata) in the screen. Also running showinf -nopix -omexml on Week10_200907_B02_s1_w18E215662-2CF7-4739-93F3-DBD0C40B78DB.tif directly does not throw any error