HUPO-PSI / mzML

Repository for mzML and the corresponding examples
26 stars 16 forks source link

[mzML] sourceFileList: count=0` is valid, but violates `minOccurs="1" #2

Closed sneumann closed 5 years ago

sneumann commented 5 years ago

Hi, we're currently discussing a test failure in our mzR package in https://github.com/sneumann/mzR/issues/192

      <sourceFileList count="0">
      </sourceFileList>

xmlSchemaValidate(mzML_xsd_idx, out_file):
 "Element '{http://psi.hupo.org/ms/mzml}sourceFileList': Missing child element(s). 
Expected is ( {http://psi.hupo.org/ms/mzml}sourceFile )."

This is triggered by a minOccurs in the XSD:

  <xs:complexType name="SourceFileListType">
    <xs:annotation>
      <xs:documentation>List and descriptions of the source files this mzML document was generated or derived from</xs:documentation>
    </xs:annotation>
    <xs:sequence>
      <xs:element minOccurs="1" maxOccurs="unbounded" name="sourceFile" type="dx:SourceFileType"/>
    </xs:sequence>
    <xs:attribute name="count" type="xs:nonNegativeInteger" use="required">
      <xs:annotation>
        <xs:documentation>Number of source files used in generating the instance document.</xs:documentation>
      </xs:annotation>
    </xs:attribute>
  </xs:complexType>

I am wondering whether that is actually a schema error: if there are no sourceFiles, then count="0", which sounds right, count=0 is valid, but violates minOccurs="1"

So name="count" type="xs:nonNegativeInteger" should be constraint by a <minInclusive value='1'/> (https://www.w3.org/TR/xmlschema-2/#rf-minInclusive) or the sourceFileList would have to be optional. If sourceFileList became optional, existing files will remain valid in such an updated schema.

Similar issues might be lurking in other *ListTypes, but I haven't checked yet.

Yours, Steffen

edeutsch commented 5 years ago

I believe the answer is: IF you emit the element, then you MUST have at least one . If you do not have any source files to encode, then do NOT emit the element. This convention is followed throughout the schema for all <****List> elements.

edeutsch commented 5 years ago

oops, I always forget that these GitHub messages destroy attempts to include element names in brackets. Let me try that again: IF you emit the element sourceFileList, then you MUST have at least one sourceFile element. If you do not have any source files to encode, then do NOT emit the element sourceFileList. This convention is followed throughout the schema for all ****List elements.

sneumann commented 5 years ago

Indeed, I now have a fix for pwiz / msdata in mzR. Thanks, Steffen