Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
184 stars 37 forks source link

Discrepancies in charge state assignment and PTM detection when Reanalyzing data with Fragpipe #1581

Closed Mathieu-Maessen closed 3 months ago

Mathieu-Maessen commented 4 months ago

Bug_report.zip

Greetings Fragpipe team,

I am currently using Fragpipe to reanalyses data to uncover previously undetected PTMs. (The original data was analyzed using MS-GF+.) Unfortunately, it seems that I am running into some issues. When running Fragpipe with Percolator it will return “Found 0 test set PSMs with q<0.01” for certain entries.

I double checked;

I conducted some exploratory analysis into the data. I ran Fragpipe with 100% FDR and compared its .pepXML file to the .mzid file from the original analysis. There is barely any overlap between the peptides detected for the same scans. When reviewing the spectra, I saw that they annotated different peaks. I suspect this is due to the fact that Fragpipe assigns different charges to the same scans. Which in turn leads to different neutral masses. Moreover, it seems to have a preference for charge 3 and 4. When using the “override charge with precursor charge” setting and set it to 2 and 3, it only assigned a charge of 3 to each scan. What could cause such a shift in charge? The relevant documentation is compiled into the attached zip.

Thank you in advance,

Mathieu M

fcyu commented 4 months ago

Hi Mathieu,

First of all, could you try the built-in Open workflow if you want to perform an open search. Second, it seems that your data has low-resolution MS/MS, which is not suitable to perform the open search. Last, does your sample have any enriched PTM or chemical labelling, and is it digested by trypsin?

Thanks,

Fengchao

Mathieu-Maessen commented 4 months ago

Hi Fengchao,

Thanks for the quick response!

Regarding your advice:

Built-in Open Workflow: As a sanity check, I reran the entry using the built-in Open workflow both with and without Percolator. Unfortunately, this did not resolve the issue—no relevant results were obtained.

Low-Resolution MS/MS Data: I also reran the entry using the "Default" closed settings. This similarly resulted in no meaningful outcomes, with different peptides still being annotated for the same scans.

PTM Enrichment or Chemical Labeling: Upon further review, I can confirm that there is no PTM enrichment or chemical labeling in the data. This absence of labeling is also consistent with the mzid file and the methods section, which do not mention any such modifications.

Given these points, I am still encountering the issue. Could you provide any additional insights or parameters I should consider to resolve this?

Thank you for your help.

Best regards,

Mathieu

fcyu commented 4 months ago

Hi Mathieu,

Thanks for the answers. Then, it is quite weird. Could you share one of your raw and fasta with us? I will send you a link to upload the data if you need one.

Thanks,

Fengchao

Mathieu-Maessen commented 4 months ago

Hi Fengchao,

If you could send me a link that would be super, thank you.

Greetings,

Mathieu

fcyu commented 4 months ago

Sure, here it is https://www.dropbox.com/request/dUONz9bshVFix37BcxCR

Best,

Fengchao

fcyu commented 4 months ago

Hi Mathieu,

Thanks for sharing the files. I have tested using both FragPipe and MS-GF+. Both tools got almost 0 IDs.

As a reference, since you said you used MS-GF+, here is the table from MS-GF+, as you can see, with 1% QValue filtering, there are only 2 PSMs: YR_AP_002_LTQ1_19April06_Draco_06-01-11.zip

If you believe that you got many IDs with MS-GF+, could you share your command or log file?

Thanks,

Fengchao

Mathieu-Maessen commented 3 months ago

Hello Fengchao,

Thank you for testing the files.
I have compared the PSM’s between the MS-GF+ output you have send and the original results. The original results have many more PSMs in total of which 58 with a qvalue of 0. Two of those 58 match the two PSMs found by you, in terms of peptide sequence. All other sequences are mismatched, even though the precursor masses are the same. Unfortunately, I do not have the original command or log files produced by MS-GF+ for these identification, because they were produced by the Pacific Northwest National Laboratory (PNNL). I am currently reanalyzing their data. I do, however, have the mzid file, which contains all the PSMs as well as the search parameters. The initially analysis by PNNL was conducted using MSGF+ algorithm version v9979.

Kind regards,

Mathieu M

FC_2.zip