HUPO-PSI / mzML

Repository for mzML and the corresponding examples
26 stars 16 forks source link

Validation error in mzML with Thermo RAW with UV/PDA data from Orbitrap Elite #10

Open sneumann opened 1 year ago

sneumann commented 1 year ago

Hi,

We have an issue with an msconvert converted mzML file obtained from an Orbitrap Elite (software versions below). In the mzML there are three <instrumentConfigurations>, one for the Orbitrap, one for the ion trap, and finally one for the UV detector. There are <spectrum> with MS data, and others with UV/PDA data in the mzML, referencing the respective instrument configurations.

OpenMS FileInfo validation complains with

Validating mzML file against XML schema version 1.1.0
Validation error in file 'LAA_MM8_nFS.mzML' line 78 column 25: 
     element 'detector' is not allowed for content model '(source+,analyzer+,detector+)'
Failed - errors are listed above!

which is because the UV/PDA does not really come with a source and analyser component. => That looks like an issue with the mzML specification (or documentation) to me.

I'd like to start discussing possible solutions: 1) Have msconvert drop the instrument information, what isn't there can't fail validation. Poor choice, because we loose that information. 2) Relax the mzML schema definition, and don't enforce all three of (source+,analyzer+,detector+)' 3) Use Empty/Null values for source+,analyzer+' but require their presence to make validation happy. 4) Something else.

Ideas ?

Yours, Steffen

mzML XSD with componentList definition: https://github.com/HUPO-PSI/mzML/blob/81e0145f65dd7abf56a9bea51bfdf66ed7767905/schema/schema_1.1/mzML1.1.1.xsd#L333

The instrument components contain 0:n cvParams, so they could be left empty: https://github.com/HUPO-PSI/mzML/blob/81e0145f65dd7abf56a9bea51bfdf66ed7767905/schema/schema_1.1/mzML1.1.1.xsd#L137

    <softwareList count="2">
      <software id="Xcalibur" version="2.7.0 SP1">
        <cvParam cvRef="MS" accession="MS:1000532" name="Xcalibur" value=""/>
      </software>
      <software id="pwiz" version="3.0.22242">
        <cvParam cvRef="MS" accession="MS:1000615" name="ProteoWizard software" value=""/>
      </software>
    </softwareList>
...
      <instrumentConfiguration id="IC3">
        <referenceableParamGroupRef ref="CommonInstrumentParams"/>
        <componentList count="1">
          <detector order="1">
            <cvParam cvRef="MS" accession="MS:1000621" name="photodiode array detector" value=""/>
          </detector>
        </componentList>

Raw data available at https://drive.google.com/file/d/1EEUO_F1X1PLYe10qsKYTNwRO5FnHkNqu/view?usp=sharing mzML at https://drive.google.com/file/d/1ogA7lIfeYAZKQ7vdwA3L-jeu9Gbd8W9v/view?usp=sharing

Software/Version: Xcalibur 2.2 - Qual Browser Thermo Xcalibur 2.2 SP1.48 (Analysis), Orbitrap Elite 2.7 - LTQ Tune Plus Version 2.7.0.1103 SP1 (Control)