Closed mwang87 closed 4 years ago
Similar issue here:
https://github.com/HUPO-PSI/mzTab/blob/master/examples/2_0-Metabolomics_Release/MTBLS263.mztab
The references are for mzML files, but index is used in spectra_ref rather than
mzML unique identifier xsd:string
Confirmed, scan number only nativeID format MS:1000776 Native format defined by scan=xsd:nonNegativeInteger. [ PSI:MS ]
Thanks!
I think there is still an error in the format of the mzTab. The reference inside was simply substituting the index for scan in this file:
https://github.com/HUPO-PSI/mzTab/blob/master/examples/2_0-Metabolomics_Release/MTBLS263.mztab
Where index is 0-based in the specification:
and scan= is defined as an identifier supplied by the vendor. In these particular examples, they seem to be a 1-based numbering.
Ming
The original mzML files from here are Thermo raw files converted to mzML. So the proper way of referencing scans would be to use "controllerType=0 controllerNumber=1 scan=1", following MS:1000768 Thermo nativeID format which has scan starting from 1.
@jmrein Could you please have a look at this and maybe export a new file with fixed spectra_refs? I think the MTBLS263 file was contributed by you originally?
@jmrein Please see https://github.com/HUPO-PSI/mzTab/commit/a847758e09591f6535864f18c9654a6d63423561 for a proposed fix.
In looking in the actual data, and I think this is incorrect of a fix. Specifically, the scan numbers are incorrect as they point to MS1 spectra and not the appropriate MS2 spectrum. I think its because the scan numbers need to be incremented by 1.
Acknowledged! Waiting for @jmrein to provide an update.
Looking at ms_run[1], which is this file: https://www.ebi.ac.uk/metabolights/MTBLS263/files/3injections_inj1_POS.mzML
If I go to the first referenced scan in the SME part:
SEH SME_ID evidence_input_id database_identifier chemical_formula smiles inchi chemical_name uri derivatized_form adduct_ion exp_mass_to_charge charge theoretical_mass_to_charge spectra_ref identification_method ms_level id_confidence_measure[1] id_confidence_measure[2] id_confidence_measure[3] rank opt_global_retention_time_in_seconds opt_global_retention_time_in_seconds_database
SME 1 413.81_114.0654m/z CHEBI:16737 C4H7N3O null null Creatinine null null [M+H]+ 114.0654 1 114.0662 ms_run[1]:controllerType=0 controllerNumber=1 scan=274 | ms_run[1]:controllerType=0 controllerNumber=1 scan=290 | ms_run[2]:controllerType=0 controllerNumber=1 scan=274 | ms_run[2]:controllerType=0 controllerNumber=1 scan=290 | ms_run[3]:controllerType=0 controllerNumber=1 scan=270 | ms_run[3]:controllerType=0 controllerNumber=1 scan=288 | ms_run[4]:controllerType=0 controllerNumber=1 scan=268 | ms_run[4]:controllerType=0 controllerNumber=1 scan=284 | ms_run[5]:controllerType=0 controllerNumber=1 scan=266 | ms_run[5]:controllerType=0 controllerNumber=1 scan=282 | ms_run[6]:controllerType=0 controllerNumber=1 scan=266 | ms_run[6]:controllerType=0 controllerNumber=1 scan=282 [,,Progenesis MetaScope,] [MS,MS:1000511,ms level,2] 56.4424 0 99.6059 1 413.81 414
Then the ms run reference scan id points to the following scan in the mzML file, which is a MS2 scan and also coincides with the precursor scan's selected mass at 114.0655:
<spectrum index="273" id="controllerType=0 controllerNumber=1 scan=274" defaultArrayLength="490">
<cvParam cvRef="MS" accession="MS:1000580" name="MSn spectrum" value=""/>
<cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="2"/>
<cvParam cvRef="MS" accession="MS:1000130" name="positive scan" value=""/>
<cvParam cvRef="MS" accession="MS:1000128" name="profile spectrum" value=""/>
<cvParam cvRef="MS" accession="MS:1000504" name="base peak m/z" value="109.991729736328" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
<cvParam cvRef="MS" accession="MS:1000505" name="base peak intensity" value="14827.796875" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
<cvParam cvRef="MS" accession="MS:1000285" name="total ion current" value="1.5469434375e05"/>
<cvParam cvRef="MS" accession="MS:1000528" name="lowest observed m/z" value="50.000223699543" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
<cvParam cvRef="MS" accession="MS:1000527" name="highest observed m/z" value="125.5291666645" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
<cvParam cvRef="MS" accession="MS:1000796" name="spectrum title" value="20130909_SAM929_POS1.274.274.1 File:"20130909_SAM929_POS1.RAW", NativeID:"controllerType=0 controllerNumber=1 scan=274""/>
<scanList count="1">
<cvParam cvRef="MS" accession="MS:1000795" name="no combination" value=""/>
<scan>
<cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="6.740293333333" unitCvRef="UO" unitAccession="UO:0000031" unitName="minute"/>
<cvParam cvRef="MS" accession="MS:1000512" name="filter string" value="FTMS + p ESI d w Full ms2 114.07@cid40.00 [50.00-125.00]"/>
<cvParam cvRef="MS" accession="MS:1000616" name="preset scan configuration" value="2"/>
<cvParam cvRef="MS" accession="MS:1000927" name="ion injection time" value="110.873481750488" unitCvRef="UO" unitAccession="UO:0000028" unitName="millisecond"/>
<userParam name="[Thermo Trailer Extra]Monoisotopic M/Z:" value="114.06556701660156" type="xsd:float"/>
<scanWindowList count="1">
<scanWindow>
<cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" value="50.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
<cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" value="125.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
</scanWindow>
</scanWindowList>
</scan>
</scanList>
<precursorList count="1">
<precursor spectrumRef="controllerType=0 controllerNumber=1 scan=273">
<isolationWindow>
<cvParam cvRef="MS" accession="MS:1000827" name="isolation window target m/z" value="114.07" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
<cvParam cvRef="MS" accession="MS:1000828" name="isolation window lower offset" value="1.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
<cvParam cvRef="MS" accession="MS:1000829" name="isolation window upper offset" value="1.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
</isolationWindow>
<selectedIonList count="1">
<selectedIon>
<cvParam cvRef="MS" accession="MS:1000744" name="selected ion m/z" value="114.065567016602" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
<cvParam cvRef="MS" accession="MS:1000041" name="charge state" value="1"/>
<cvParam cvRef="MS" accession="MS:1000042" name="peak intensity" value="1.716534875e06" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
</selectedIon>
</selectedIonList>
<activation>
<cvParam cvRef="MS" accession="MS:1000133" name="collision-induced dissociation" value=""/>
<cvParam cvRef="MS" accession="MS:1000045" name="collision energy" value="40.0" unitCvRef="UO" unitAccession="UO:0000266" unitName="electronvolt"/>
</activation>
</precursor>
</precursorList>
...
So the fix seems to be valid for this scan ref.
@mwang87 Do you have a specific scan reference where this fails? It would help tremendously, if you could provide an error message.
Thanks @nilshoffmann didn’t realize you all pushed out a previous fix, I’ll test with the updated version but I think superficially it looks good.
The latest update looks great. We've tested our parsers and it seems to be finding the correct MS/MS spectra now.
Thanks so much!
Spectra are referenced incorrectly for index is incompatible with ms_run id_format.
id_format is defined as "scan number only nativeID format"
https://github.com/HUPO-PSI/mzTab/blob/master/examples/2_0-Metabolomics_Release/StandardMix_negative_exportSpeciesLevel.mzTab#L23
however, the reference is using index=
https://github.com/HUPO-PSI/mzTab/blob/master/examples/2_0-Metabolomics_Release/StandardMix_negative_exportSpeciesLevel.mzTab#L705
This is disallowed according to the definition:
https://hupo-psi.github.io/mzTab/2_0-metabolomics-release/mzTab_format_specification_2_0-M_release.html#spectra_ref
for the value in the spectra_ref should be:
"scan=xsd:nonNegativeInteger"
Ming