OpenChrom / openchrom

Visualization and Analysis of mass spectrometric and chromatographic data.
https://www.openchrom.net
Eclipse Public License 1.0
82 stars 24 forks source link

mzML output validation errors #370

Closed sneumann closed 1 year ago

sneumann commented 1 year ago

Hi, the mzML output can probably be used by probably the majority of mzML accepting tools out there just fine. However, if validating the mzML output with something as picky as OpenMS FileInfo, I get an no declaration found for element 'mzML'.

This has to do with our beloved XML Namespaces. Currently, there is no definition for the root namespace:

    <mzML
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      version="1.1.0"
      xsi:schemaLocation="http://psi.hupo.org/ms/mzml http://psidev.info/files/ms/mzML/xsd/mzML1.1.0.xsd">

which can be added fairly simply:

    <mzML xmlns="http://psi.hupo.org/ms/mzml"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      version="1.1.0"
      xsi:schemaLocation="http://psi.hupo.org/ms/mzml http://psidev.info/files/ms/mzML/xsd/mzML1.1.0.xsd">

After that, there are several errors about missing attributes required by the XSD:

Validating mzML file against XML schema version 1.1.0
 line 2 column 70: missing required attribute 'defaultInstrumentConfigurationRef'
 line 3 column 23: missing required attribute 'count'
 line 3 column 23: missing required attribute 'defaultDataProcessingRef'
 line 5 column 153: missing required attribute 'cvRef'
 line 11 column 23: missing required attribute 'count'
...
 line 106196 column 37: missing required attribute 'defaultDataProcessingRef'
 line 106197 column 51: missing required attribute 'id'
 line 106197 column 51: missing required attribute 'index'
 line 106202 column 81: missing required attribute 'accession'
 line 106202 column 81: missing required attribute 'name'
 line 106212 column 26: element 'mzML' has identity constraint key with no value
 line 106215 column 12: element 'run' is not allowed for content model '(cvList,fileDescription,referenceableParamGroupList?,sampleList?,softwareList,scanSettingsList?,instrumentConfigurationList,dataProcessingList,run)'
Failed - errors are listed above!

E.g., what's missing is an cvList and in cvParams the references to a CV:

    <cvList count="2">
      <cv id="MS" fullName="Proteomics Standards Initiative Mass Spectrometry Ontology" version="4.1.99" URI="https://raw.githubusercontent.com/HUPO-PSI/psi-ms-CV/master/psi-ms.obo"/>
      <cv id="UO" fullName="Unit Ontology" version="09:04:2014" URI="https://raw.githubusercontent.com/bio-ontology-research-group/unit-ontology/master/unit.obo"/>
    </cvList>
...
<cvParam cvRef="MS" ...>

And the lists like <run> and <scanList> are missing their count attribute.

Note that element 'mzML' has identity constraint key with no value might be a followup error of the previous ones, I had a hard time understanding the root cause there.

Yours, Steffen

Mailaender commented 1 year ago

This was released in https://github.com/OpenChrom/openchrom/wiki/Changelog#20230616