ProteoWizard / pwiz

The ProteoWizard Library is a set of software libraries and tools for rapid development of mass spectrometry and proteomic data analysis software.
http://proteowizard.sourceforge.net/
Apache License 2.0
239 stars 100 forks source link

msconvert.exe raw output MGF differs from proteome-discoverer #2152

Closed animesh closed 2 years ago

animesh commented 2 years ago

I am using the following command to convert a Thermo/HF raw file into MGF

"F:\OneDrive - NTNU\ProteoWizard 3.0.22155.0ff594f 64-bit\msconvert.exe"  --filter "peakPicking true 1-" --mgf l:\190128_robin_WT_5.raw
format: MGF
outputPath: .
extension: .mgf
contactFilename:
runIndexSet:

spectrum list filters:
  peakPicking true 1-

chromatogram list filters:

filenames:
  l:\190128_robin_WT_5.raw

processing file: l:\190128_robin_WT_5.raw
calculating source file checksums

writing output file: .\190128_robin_WT_5.mgf

and see it is different from the proteome-discoverer 2.5 PWF_QE_Basic_MGFx - Copy.pdProcessingWF.txt output?

For example proteome-discoverer 2.5 says

BEGIN IONS
TITLE=File: "F:\190128_robin_WT_5.raw"; SpectrumID: "58"; PrecursorID: "0"; scans: "58"
PEPMASS=562.91254 2160.42041
CHARGE=2+
RTINSECONDS=21
SCANS=58
93.28708 1613.25
211.51585 1544.29
213.93216 18616.9
549.95520 1562.23
END IONS

while msconvert says

BEGIN IONS
TITLE=190128_robin_WT_5.58.58.2
RTINSECONDS=21.7782996
PEPMASS=562.860046386719 33584.291259765625
CHARGE=2+
93.28707886 1613.2463378906
211.5158539 1544.2867431641
213.9321594 18616.86328125
301.7001648 3080.8708496094
549.9552002 1562.2274169922
END IONS

wondering what could be the reason and which one is more or less correct?

I can share the raw file if the need be?

chambm commented 2 years ago

I see several differences but only number 3 seems significant:

  1. Scan time in seconds is truncated to an integer for some reason in PD's output.
  2. Precursor intensity is calculated differently.
  3. Precursor mass is different. Possibly some kind of refinement is being done by PD and not msconvert. You could try the precursorRefiner and precursorRecalculator filters.
  4. The exception peak added by msconvert and not PD.
animesh commented 2 years ago

I tried several experiments with your input and none of them is getting close to the PD-result

"F:\OneDrive - NTNU\ProteoWizard 3.0.22155.0ff594f 64-bit\msconvert.exe"  --filter "precursorRefine peakPicking true 1-" --mgf "F:\OneDrive - NTNU\190128_robin_WT_5.raw"
BEGIN IONS
TITLE=190128_robin_WT_5.58.58.2
RTINSECONDS=21.7782996
PEPMASS=562.859787097224 129107.766181900006
CHARGE=2+
93.28707886 1613.2463378906
211.5158539 1544.2867431641
213.9321594 18616.86328125
301.7001648 3080.8708496094
549.9552002 1562.2274169922
END IONS

"F:\OneDrive - NTNU\ProteoWizard 3.0.22155.0ff594f 64-bit\msconvert.exe"  --filter "precursorRecalculation peakPicking true 1-" --mgf "F:\OneDrive - NTNU\190128_robin_WT_5.raw"

BEGIN IONS
TITLE=190128_robin_WT_5.58.58.2
RTINSECONDS=21.7782996
PEPMASS=562.860046386719 129107.766181900006
CHARGE=2+
93.28707886 1613.2463378906
211.5158539 1544.2867431641
213.9321594 18616.86328125
301.7001648 3080.8708496094
549.9552002 1562.2274169922
END IONS

"F:\OneDrive - NTNU\ProteoWizard 3.0.22155.0ff594f 64-bit\msconvert.exe"  --filter "precursorRecalculation precursorRefine peakPicking true 1-" --mgf "F:\OneDrive - NTNU\190128_robin_WT_5.raw"

BEGIN IONS
TITLE=190128_robin_WT_5.58.58.2
RTINSECONDS=21.7782996
PEPMASS=562.860046386719 129107.766181900006
CHARGE=2+
93.28707886 1613.2463378906
211.5158539 1544.2867431641
213.9321594 18616.86328125
301.7001648 3080.8708496094
549.9552002 1562.2274169922
END IONS

"F:\OneDrive - NTNU\ProteoWizard 3.0.22155.0ff594f 64-bit\msconvert.exe"  --filter "precursorRecalculation precursorRefine" --mgf "F:\OneDrive - NTNU\190128_robin_WT_5.raw"

BEGIN IONS
TITLE=190128_robin_WT_5.157.157.2
RTINSECONDS=58.7746344
PEPMASS=562.860107421875 136355.315918000008
CHARGE=2+
83.74562836 1534.3530273438
213.936264 11447.2548828125
213.9624481 5957.8374023438
301.7164307 4459.0385742188
END IONS

"F:\OneDrive - NTNU\ProteoWizard 3.0.22155.0ff594f 64-bit\msconvert.exe"  --filter "precursorRefine" --mgf "F:\OneDrive - NTNU\190128_robin_WT_5.raw"

BEGIN IONS
TITLE=190128_robin_WT_5.157.157.2
RTINSECONDS=58.7746344
PEPMASS=562.859678994197 136355.315918000008
CHARGE=2+
83.74562836 1534.3530273438
213.936264 11447.2548828125
213.9624481 5957.8374023438
301.7164307 4459.0385742188
END IONS

I had also asked PD-support and they are not willing to share the details, some patent issue it seems... i had already tried their own c# lib https://github.com/animesh/RawRead and which also differs from PD results, similar to msconvert though but without the "exception peak" (where is this actually useful @chambm ?)

mono RawRead.exe 190128_robin_WT_5.raw
BEGIN IONS
TITLE=58    4   SCANS=FTMS + c NSI d Full ms2 562.8600@hcd29.00 [77.6667-1165.0000]
RTINSECONDS=21.7782996
PEPMASS=562.860046386719    213.9321644 18616.863
CHARGE=2+
93.2870788574219 1613.24633789063
211.515853881836 1544.28674316406
213.932159423828 18616.86328125
549.955200195313 1562.22741699219
END IONS

I am trying to play with the c# dll but most probably this will remain a mystery then i guess...

chambm commented 2 years ago

Your syntax is wrong for the filters. Each filter should be a separate --filter argument followed by the filter name and arguments. --filter "peakPicking true 1-" --filter "precursorRefine"

As long as one of the isotopic peaks is being picked (not some averaged mass, or the wrong peak if there's an interfering peptide), modern search engines should be able to quickly search multiple isotopes to compensate for the wrong monoisotopic peak being picked.