compomics / peptide-shaker

Interpretation of proteomics identification results
http://compomics.github.io/projects/peptide-shaker.html
48 stars 18 forks source link

Spectrum not found when parsing sage results #542

Open samgregoire opened 6 days ago

samgregoire commented 6 days ago

Hello. I'm trying to use PeptideShaker to process identification files obtained by SearchGUI using several search engines (Comet, OMSSA, Tide, X! Tandem and Sage). PeptideShaker encounters the following error when trying to process the Sage output file.

Report:

------------------------------------------------------------------

PeptideShaker 3.0.11 Report File

#

Originally saved by: sgregoire @ Precision-5820-Tower-X-Series

on: 22 Oct 2024, 20:17

------------------------------------------------------------------

Tue Oct 22 20:17:23 CEST 2024 Unzipping searchgui_out.zip. Tue Oct 22 20:17:29 CEST 2024 Import process for scPTM1

Tue Oct 22 20:17:29 CEST 2024 Importing sequences from uniprotkb_proteome_UP000005640_AND_revi_2024_08_26.fasta. Tue Oct 22 20:17:30 CEST 2024 Importing gene mappings. Tue Oct 22 20:17:31 CEST 2024 Establishing local database connection. Tue Oct 22 20:17:31 CEST 2024 Reading identification files. Tue Oct 22 20:17:31 CEST 2024 Parsing 240726_MClassChip_24_001721_CBio11_LSM_1_1_4350.comet.pep.xml.gz. Tue Oct 22 20:17:31 CEST 2024 No PSM found in 240726_MClassChip_24_001721_CBio11_LSM_1_1_4350.comet.pep.xml.gz. Tue Oct 22 20:17:31 CEST 2024 Parsing 240726_MClassChip_24_001721_CBio11_LSM_1_1_4350.omx.gz. Tue Oct 22 20:17:51 CEST 2024 Checking spectra for 240726_MClassChip_24_001721_CBio11_LSM_1_1_4350.omx.gz. Tue Oct 22 20:17:51 CEST 2024 Importing PSMs from 240726_MClassChip_24_001721_CBio11_LSM_1_1_4350.omx.gz Tue Oct 22 20:17:55 CEST 2024 744 identified spectra (2.9%) did not present a valid peptide. Tue Oct 22 20:17:55 CEST 2024 3745 of the best scoring peptides were excluded by the import filters: Tue Oct 22 20:17:55 CEST 2024 - 67.6% peptide length less than 7 or greater than 30. Tue Oct 22 20:17:55 CEST 2024 - 32.4% peptide presenting high mass or isotopic deviation. Tue Oct 22 20:17:55 CEST 2024 Parsing 240726_MClassChip_24_001721_CBio11_LSM_1_1_4350.sage.tsv.gz. Tue Oct 22 20:17:55 CEST 2024 Checking spectra for 240726_MClassChip_24_001721_CBio11_LSM_1_1_4350.sage.tsv.gz. Tue Oct 22 20:17:55 CEST 2024 Spectrum with title 'frame=9411 scan=411' in file named '240726_MClassChip_24_001721_CBio11_LSM_1_1_4350' required to parse '240726_MClassChip_24_001721_CBio11_LSM_1_1_4350.sage.tsv.gz' not found.

Tue Oct 22 20:17:55 CEST 2024 Importing Data Canceled!

Please, ignore the fact that no PSM was found in the Comet file, I encoutered an error during the search.

I tried to run PeptideShaker without the Sage identification file and everything seems to work fine. I tried to process only the Sage identification file alone and ran into the same error. After checking, I can guarantee that the mzML file I'm using contains a 'frame=9411 scan=411' scan. Removing this specific PSM from the identification files only changes the error message to another scan title.

Please, let me know if I can provide any more useful information.

Thanks, Sam

EDIT: perhaps the problem encountered during Comet processing is actually related to this problem? I get the following error message:

Load spectra:free(): invalid pointer

I don't really understand what this means, but it seems to be related to spectra loading, which could be related to the issue I'm describing above.

samgregoire commented 6 days ago

I forgot to mention that I did some initial testing on a much smaller mzML file (2G, from a thermo orbitrap MS) compared to the file I'm having the problem with (59G, from a Bruker timsTOF MS), and that both Comet and Sage were working correctly.

hbarsnes commented 4 days ago

I'm afraid that neither SearchGUI nor PeptideShaker was developed for spectrum files as big as 59GB, hence the issues you are seeing may simply be due to the large mzML file. Would it be an option to convert the mzML to mgf and see whether that makes a difference?

As for the invalid pointer error, this comes from inside Comet and I'd recommend contacting the Comet developers directly to see if anything can be done about it: https://groups.google.com/g/comet-ms.

And I assume that there were no similar errors when running Sage? You may however consider also contacting the Sage developers (https://github.com/lazear/sage) to see if there are any limitations with regards to very large mzML files?