HUPO-PSI / mzTab

mzTab Reporting MS-based Proteomics and Metabolomics Results
https://hupo-psi.github.io/mzTab
37 stars 16 forks source link

Improperly formatted example mztab-m files for spectra reference #178

Closed mwang87 closed 4 years ago

mwang87 commented 4 years ago

Spectra are referenced incorrectly for index is incompatible with ms_run id_format.

id_format is defined as "scan number only nativeID format"

https://github.com/HUPO-PSI/mzTab/blob/master/examples/2_0-Metabolomics_Release/StandardMix_negative_exportSpeciesLevel.mzTab#L23

however, the reference is using index=

https://github.com/HUPO-PSI/mzTab/blob/master/examples/2_0-Metabolomics_Release/StandardMix_negative_exportSpeciesLevel.mzTab#L705

This is disallowed according to the definition:

https://hupo-psi.github.io/mzTab/2_0-metabolomics-release/mzTab_format_specification_2_0-M_release.html#spectra_ref

for the value in the spectra_ref should be:

"scan=xsd:nonNegativeInteger"

Ming

mwang87 commented 4 years ago

Similar issue here:

https://github.com/HUPO-PSI/mzTab/blob/master/examples/2_0-Metabolomics_Release/MTBLS263.mztab

The references are for mzML files, but index is used in spectra_ref rather than

mzML unique identifier xsd:string

nilshoffmann commented 4 years ago

Confirmed, scan number only nativeID format MS:1000776 Native format defined by scan=xsd:nonNegativeInteger. [ PSI:MS ]

mwang87 commented 4 years ago

Thanks!

I think there is still an error in the format of the mzTab. The reference inside was simply substituting the index for scan in this file:

https://github.com/HUPO-PSI/mzTab/blob/master/examples/2_0-Metabolomics_Release/MTBLS263.mztab

Where index is 0-based in the specification:

https://hupo-psi.github.io/mzTab/2_0-metabolomics-release/mzTab_format_specification_2_0-M_release.html#spectra_ref

and scan= is defined as an identifier supplied by the vendor. In these particular examples, they seem to be a 1-based numbering.

Ming

nilshoffmann commented 4 years ago

The original mzML files from here are Thermo raw files converted to mzML. So the proper way of referencing scans would be to use "controllerType=0 controllerNumber=1 scan=1", following MS:1000768 Thermo nativeID format which has scan starting from 1.

nilshoffmann commented 4 years ago

@jmrein Could you please have a look at this and maybe export a new file with fixed spectra_refs? I think the MTBLS263 file was contributed by you originally?

nilshoffmann commented 4 years ago

@jmrein Please see https://github.com/HUPO-PSI/mzTab/commit/a847758e09591f6535864f18c9654a6d63423561 for a proposed fix.

mwang87 commented 4 years ago

In looking in the actual data, and I think this is incorrect of a fix. Specifically, the scan numbers are incorrect as they point to MS1 spectra and not the appropriate MS2 spectrum. I think its because the scan numbers need to be incremented by 1.

nilshoffmann commented 4 years ago

Acknowledged! Waiting for @jmrein to provide an update.

nilshoffmann commented 4 years ago

Looking at ms_run[1], which is this file: https://www.ebi.ac.uk/metabolights/MTBLS263/files/3injections_inj1_POS.mzML

If I go to the first referenced scan in the SME part:

SEH SME_ID  evidence_input_id   database_identifier chemical_formula    smiles  inchi   chemical_name   uri derivatized_form    adduct_ion  exp_mass_to_charge  charge  theoretical_mass_to_charge  spectra_ref identification_method   ms_level    id_confidence_measure[1]    id_confidence_measure[2]    id_confidence_measure[3]    rank    opt_global_retention_time_in_seconds    opt_global_retention_time_in_seconds_database
SME 1 413.81_114.0654m/z CHEBI:16737 C4H7N3O null null Creatinine null null [M+H]+ 114.0654 1 114.0662 ms_run[1]:controllerType=0 controllerNumber=1 scan=274 | ms_run[1]:controllerType=0 controllerNumber=1 scan=290 | ms_run[2]:controllerType=0 controllerNumber=1 scan=274 | ms_run[2]:controllerType=0 controllerNumber=1 scan=290 | ms_run[3]:controllerType=0 controllerNumber=1 scan=270 | ms_run[3]:controllerType=0 controllerNumber=1 scan=288 | ms_run[4]:controllerType=0 controllerNumber=1 scan=268 | ms_run[4]:controllerType=0 controllerNumber=1 scan=284 | ms_run[5]:controllerType=0 controllerNumber=1 scan=266 | ms_run[5]:controllerType=0 controllerNumber=1 scan=282 | ms_run[6]:controllerType=0 controllerNumber=1 scan=266 | ms_run[6]:controllerType=0 controllerNumber=1 scan=282 [,,Progenesis MetaScope,] [MS,MS:1000511,ms level,2] 56.4424 0 99.6059 1 413.81 414

Then the ms run reference scan id points to the following scan in the mzML file, which is a MS2 scan and also coincides with the precursor scan's selected mass at 114.0655:

        <spectrum index="273" id="controllerType=0 controllerNumber=1 scan=274" defaultArrayLength="490">
          <cvParam cvRef="MS" accession="MS:1000580" name="MSn spectrum" value=""/>
          <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="2"/>
          <cvParam cvRef="MS" accession="MS:1000130" name="positive scan" value=""/>
          <cvParam cvRef="MS" accession="MS:1000128" name="profile spectrum" value=""/>
          <cvParam cvRef="MS" accession="MS:1000504" name="base peak m/z" value="109.991729736328" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
          <cvParam cvRef="MS" accession="MS:1000505" name="base peak intensity" value="14827.796875" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
          <cvParam cvRef="MS" accession="MS:1000285" name="total ion current" value="1.5469434375e05"/>
          <cvParam cvRef="MS" accession="MS:1000528" name="lowest observed m/z" value="50.000223699543" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
          <cvParam cvRef="MS" accession="MS:1000527" name="highest observed m/z" value="125.5291666645" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
          <cvParam cvRef="MS" accession="MS:1000796" name="spectrum title" value="20130909_SAM929_POS1.274.274.1 File:&quot;20130909_SAM929_POS1.RAW&quot;, NativeID:&quot;controllerType=0 controllerNumber=1 scan=274&quot;"/>
          <scanList count="1">
            <cvParam cvRef="MS" accession="MS:1000795" name="no combination" value=""/>
            <scan>
              <cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="6.740293333333" unitCvRef="UO" unitAccession="UO:0000031" unitName="minute"/>
              <cvParam cvRef="MS" accession="MS:1000512" name="filter string" value="FTMS + p ESI d w Full ms2 114.07@cid40.00 [50.00-125.00]"/>
              <cvParam cvRef="MS" accession="MS:1000616" name="preset scan configuration" value="2"/>
              <cvParam cvRef="MS" accession="MS:1000927" name="ion injection time" value="110.873481750488" unitCvRef="UO" unitAccession="UO:0000028" unitName="millisecond"/>
              <userParam name="[Thermo Trailer Extra]Monoisotopic M/Z:" value="114.06556701660156" type="xsd:float"/>
              <scanWindowList count="1">
                <scanWindow>
                  <cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" value="50.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                  <cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" value="125.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                </scanWindow>
              </scanWindowList>
            </scan>
          </scanList>
          <precursorList count="1">
            <precursor spectrumRef="controllerType=0 controllerNumber=1 scan=273">
              <isolationWindow>
                <cvParam cvRef="MS" accession="MS:1000827" name="isolation window target m/z" value="114.07" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                <cvParam cvRef="MS" accession="MS:1000828" name="isolation window lower offset" value="1.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                <cvParam cvRef="MS" accession="MS:1000829" name="isolation window upper offset" value="1.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
              </isolationWindow>
              <selectedIonList count="1">
                <selectedIon>
                  <cvParam cvRef="MS" accession="MS:1000744" name="selected ion m/z" value="114.065567016602" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                  <cvParam cvRef="MS" accession="MS:1000041" name="charge state" value="1"/>
                  <cvParam cvRef="MS" accession="MS:1000042" name="peak intensity" value="1.716534875e06" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
                </selectedIon>
              </selectedIonList>
              <activation>
                <cvParam cvRef="MS" accession="MS:1000133" name="collision-induced dissociation" value=""/>
                <cvParam cvRef="MS" accession="MS:1000045" name="collision energy" value="40.0" unitCvRef="UO" unitAccession="UO:0000266" unitName="electronvolt"/>
              </activation>
            </precursor>
          </precursorList>
...

So the fix seems to be valid for this scan ref.

@mwang87 Do you have a specific scan reference where this fails? It would help tremendously, if you could provide an error message.

mwang87 commented 4 years ago

Thanks @nilshoffmann didn’t realize you all pushed out a previous fix, I’ll test with the updated version but I think superficially it looks good.

mwang87 commented 4 years ago

The latest update looks great. We've tested our parsers and it seems to be finding the correct MS/MS spectra now.

Thanks so much!