Open simonklaes opened 1 year ago
MS-GF+ looks for the "Centroid Spectrum" CV Param on each scan (it looks for accession 'MS:1000127'): https://github.com/MSGFPlus/msgfplus/blob/master/src/main/java/edu/ucsd/msjava/mzml/SpectrumConverter.java#L36 That output is saying that it only found profile scans in that file. Please confirm that the scans are centroided, as I have not previously seen this problem when the spectra in the mzML file do include the "Centroid Spectrum" CV Param, but it's possible if for some reason the accession is 'PSI-MS:1000127' (which I have not seen in mzML files previously, but have seen in mzid files)
Please confirm that the scans are centroided, as I have not previously seen this problem when the spectra in the mzML file do include the "Centroid Spectrum" CV Param, but it's possible if for some reason the accession is 'PSI-MS:1000127' (which I have not seen in mzML files previously, but have seen in mzid files)
cvParam looks fine: cvParam cvRef="MS" accession="MS:1000127" name="centroid spectrum" value=""/
If both files contain that (and do not contain a "profile spectrum" entry on the same scan), and MS-GF+ still does not work on only one of them, then we will need to see data files to determine what is happening.
If both files contain that (and do not contain a "profile spectrum" entry on the same scan), and MS-GF+ still does not work on only one of them, then we will need to see data files to determine what is happening.
I invited you to my private repository containing the files for reproducing the issue.
What is happening: MS-GF+ does a secondary check on each MSn spectrum on the median PPM difference between each peak; if the median difference is less than 50 PPM, then it marks the scan as not centroided; I tested one spectrum, and that median PPM difference is 41.077 PPM; it looks like the data is overall very clean, while most of the data points are clustered together tightly; there's even a case in that scan where there are 5 consecutive peaks that have less than 20 PPM difference between each consecutive peak.
Thanks for the quick reply. I would really appreciate it if the minimum median difference could be set freely or the secondary check could be turned off completely.
I'm looking at adding a parameter to allow ignoring the result of the secondary check if the input file says the spectrum is centroided.
See https://github.com/MSGFPlus/msgfplus/releases/tag/v2023.01.12 The zip file contains an updated MS-GF+ jar file that supports the parameter '-allowDenseCentroidedPeaks 1'. If you run a search without that parameter and centroid spectra are ignored because of the check mentioned previously, they are reported separately and the parameter is mentioned in the output.
Hello, trying to analyze samples that have been a) acquired with a 120min LC gradient and b) acquired with a 60min LC gradient. The spectra from a) work well with msgf+. However, the spectra from b) do not work with msgf+ and lead to error: no valid spectra. Other search engines (OMSSA, SequestHT) work well with a) and b).
MS spectra acquired via Orbitrap Fusion with HCD. RAW-file was converted to mzML with msConvert: --filter "peakPicking true [1,2]" msgf+ was run via Galaxy Server or SearchGUI.
Standard Output:
Fileinfo says: