Closed hechth closed 1 year ago
Since it is challenging to see the differences because these are new files, for the record, the changes are: 1) Bump in version numbers 2) Change from Windows line-endings to Unix line-endings. Will this cause a problem for anyone??? 3) Change:
<xs:complexType name="RunType">
<xs:attribute name="id" type="xs:ID" use="required">
<xs:annotation>
<xs:documentation>A unique identifier for this run.</xs:documentation>
to:
<xs:attribute name="id" use="required" type="xs:string">
<xs:annotation>
<xs:documentation>A unique string identifier for this run.</xs:documentation>
The primary reason for this is that many converters simply use the filename as the id, but xs:ID limits the permitted characters and as a result many mzML files in the wild are invalid because the encoded run id uses disallowed characters.
Since there seems little reason to insist that run id conforms to xs:ID, we will just change it to xs:string.
Does this seem sensible to @chambm and others?
An XML parser should not (edited) be affected by changing the line ending.
All seems well.
On Fri, Mar 10, 2023 at 1:34 PM Eric Deutsch @.***> wrote:
@.**** approved this pull request.
Seems good to me. Are we all fine with the change in line endings from Windows to Unix?
— Reply to this email directly, view it on GitHub https://github.com/HUPO-PSI/mzML/pull/9#pullrequestreview-1335410575, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK4E6KOAT2D7EBWRKNQMRLW3NX2HANCNFSM6AAAAAAVWNOEQM . You are receiving this because your review was requested.Message ID: @.***>
@edeutsch @mobiusklein Thanks for having another look at it!
You're talking about the line endings in the schema itself changing? I can't imagine that causing a problem for any validating parser, but note that pwiz doesn't use one (it doesn't validate against the schema). Pwiz actually does encode the XML ID and IDREFs properly (escaping the starting character if it's not alphabetical or '_') but I can certainly believe a lot of other file writers don't.
Note that an empty string is not a valid ID, but is a valid xs:string
. It probably needs a constraint added to disallow that.
I may have been wrong about the line endings. This may have been an artifact of how I was trying to diff the files.
But the main issue is the run id change from xs:ID to xs:string. Any objections?
Your point about the empty string is useful. Do we want to take step to disallow that?
It is also useful to learn the msconvert does follow the xs:ID convention. What do you do when the filename begins with a number?
It is also useful to learn the msconvert does follow the xs:ID convention. What do you do when the filename begins with a number?
We replace it with underscore _x0000 and the zeros get replaced by the hexadecimal value of the original character (it would probably mess up on non-ASCII input though). Our xml reader reverses that encoding to get the original value back. Note that the non-first characters can only be A-Za-z0-9, underscore, and hyphen. So we have to do a lot of encoding in the middle of the ID too.
There's no reason to allow it to be empty, as that'd violate the other semantics of xsd:ID
. I've added the necessary XSD directive.
One nice thing about this change is it'll be backward compatible. Any xs:ID is also a valid xs:string. So even if I don't change pwiz it'll be creating valid 1.1.1 files. Although it won't necessarily fix existing writers that are writing 1.1.0 files without proper xs:ID semantics (I mean you have to get each one of those writers to start writing 1.1.1 as the version instead).
So we now have this thanks to Joshua:
<xs:attribute name="id" use="required">
<xs:annotation>
<xs:documentation>A unique string identifier for this run.</xs:documentation>
</xs:annotation>
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:minLength value="1"/>
</xsd:restriction>
</xsd:simpleType>
</xs:attribute>
with the minLength 1 added.
Is everyone okay with this? Approvals? comments?
thanks for all to look into this! @chambm if that is the case the error must lie somewhere else - probably in the ThermoRawFileConverter tool.
Closes #8
Changed the type of the id attribute to be a string allowing everything but whitespace characters and replacing everything that is a whitespace with a single space (which is invalid).