compomics / compomics-utilities

Open source Java library for computational proteomics
http://compomics.github.io/projects/compomics-utilities.html
30 stars 17 forks source link

Parsing comet pep.xml with sparse MGF spectrum file #11

Closed jj-umn closed 8 years ago

jj-umn commented 8 years ago

I am trying to help a user with a PeptideShakerCLI run with MGF inputs that are sparse, i.e the scan number in the TITLE does represent the ordinal location in the file.

For example here is an spectrum_query element from the comet pep.xml

<spectrum_query spectrum="Mascot formatted MGF of data 10.74772.74772.3" spectrumNativeID="_x0032_0160111_ERLIC_MCF7_ingel_digest_band110kb_replicate1_017.74772.74772.3" start_scan="74772" end_scan="74772" precursor_neutral_mass="2077.794029" assumed_charge="3" index="66920" retention_time_sec="0.0">

The correspond MGF only contains 74771 spectra, scan "74772" should have zero-based index 68034:

grep '^TITLE' "searchgui_input/data/Mascot formatted MGF of data 10.mgf" | grep -n '^TITLE' | grep 74772 68035:TITLE=_x0032_0160111_ERLIC_MCF7_ingel_digest_band110kb_replicate1_017.74772.74772.3

java.lang.IndexOutOfBoundsException: Index: 74771, Size: 74771 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at com.compomics.util.experiment.io.massspectrometry.MgfIndex.getSpectrumTitle(MgfIndex.java:239) at com.compomics.util.experiment.massspectrometry.SpectrumFactory.getSpectrumTitle(SpectrumFactory.java:992) at eu.isas.peptideshaker.fileimport.FileImporter$IdProcessorFromFile.importSpectrum(FileImporter.java:999) at eu.isas.peptideshaker.fileimport.FileImporter$IdProcessorFromFile.importPsms(FileImporter.java:727) at eu.isas.peptideshaker.fileimport.FileImporter$IdProcessorFromFile.importFiles(FileImporter.java:482) at eu.isas.peptideshaker.fileimport.FileImporter.importFiles(FileImporter.java:158) at eu.isas.peptideshaker.PeptideShaker.importFiles(PeptideShaker.java:232) at eu.isas.peptideshaker.cmd.PeptideShakerCLI.createProject(PeptideShakerCLI.java:696) at eu.isas.peptideshaker.cmd.PeptideShakerCLI.call(PeptideShakerCLI.java:205) at eu.isas.peptideshaker.cmd.PeptideShakerCLI.main(PeptideShakerCLI.java:908)

Comet is putting the actual TITLE from the MGF into attribute named "spectrumNativeID". so I tried to use that to correct the scanNumber with code below, but failed since spectrumFactory.fileLoaded(inputFileName) is false at that point

Any suggestion for better handling this?

diff --git a/src/main/java/com/compomics/util/experiment/io/identifications/idfilereaders/PepxmlIdfileReader.java b/src/main/java/com/compomics/util/experiment/io/identifications/idfilereaders/PepxmlIdfileReader.java index aa49e30..862f284 100644 --- a/src/main/java/com/compomics/util/experiment/io/identifications/idfilereaders/PepxmlIdfileReader.java +++ b/src/main/java/com/compomics/util/experiment/io/identifications/idfilereaders/PepxmlIdfileReader.java @@ -475,6 +475,7 @@ public class PepxmlIdfileReader implements IdfileReader {

     Integer scanNumber = null;
     String spectrumId = null;
hbarsnes commented 8 years ago

Thanks for telling us about this. For follow-up of this issue please see https://github.com/compomics/peptide-shaker/issues/157.