Closed hechth closed 1 year ago
Also, the main purpose of xs:ID
type is to ensure a unique identifier for every XML element of that type. However, since the maxOccurs
attr is not defined for the RunType
, there can be only 1 run
element per mzML, making the unique identifier unnecessary.
RunType
definition in mzML1.1.0.xsd (where the maxOccurs
is to be specified):
<xs:sequence>
...
<xs:element name="run" type="dx:RunType" />
</xs:sequence>
from https://www.w3.org/TR/xmlschema-0/: The default value for both the minOccurs and the maxOccurs attributes is 1.
Another reason why xs:ID
might be needed is to have a target for IDREF
or IDREFS
attributes. However, I couldn't find any element that targets the ID of the run
.
Thus, it appears that xs:ID
may be safely replaced with xs:string
. Note that switching the type to string will require adding a regexp to ensure the id
doesn't contain any whitespaces.
Closing since addressed in PR #9
We're currently implementing a validator tool in Galaxy that simply takes an mzml file and uses the XSD schema to validate files using pyxml or xmllinter and we found that the
<run> ... </run>
field has an id attribute which has to be anxs:ID
type, meaning it can't start with a number. But proteowizard seems to be filling this field with the sample name, which can contain a number at the start (and often does, like the position, order in the study, timestamp etc.), meaning that many mzml files which start with a number are technically invalid.I personally don't see a reason why the
id
of theRunType
can't be axs:string
- was there a specific reason for the decision?I'd therefore like to propose to change the
id
attribute of theRunType
fromxs:ID
toxs:string
.If you agree with the change I can open up a PR with the requested changes to the XSD file. Is there anything else that has to be adapted to make this change?