Interpretation of different parameters from psm.tsv file for the assessment of identification quality

candidaorelmex commented 1 year ago

Hi, I have a couple of question about the quality assessment of PSM that passed all FDR threshholds (1%, except protein at 100% - we are doing immunopeptidomics) but that we still want to inspect more in-depth.

Could you elaborate on the meaning of the "Expectation" column in the psm.tsv output, or are there any ressources beyond what is explained on the website ("expectation value from statistical modeling with PeptideProphet, lower values indicate higher likelihood")? Though the FDR filter was passed, does an Expectation of e.g. 0.5 make an ID much less reliable? How does this value relate to the "PeptideProphet Probability" (practically, in terms of deciding on the reliability of a mass spec ID)?
Is it possible that the Hyperscore of PSM identified in DIA files with DIA-Umpire are somewhow punished? When we compare the spectra of the same peptide in the Viewer from DDA and DIA files, even though they look quite the same, the hyperscores of DIA PSMs have a trend of being lower. We observe this for many peptides - is there some kind of DIA-Umpire-hyperscore punishment or is this just coincidence?
I tried to predict spectra for an analysis that included 300 raw files, incl. ~100 DIA raw files and ~200 DDA raw files. I let the prediction run for 24h and it still hasn't loaded, is the Viewer overwhelmed with this task, is our computer not powerfull enough (we have 40 cores and 256Gb of RAM...) or do I just need to wait for longer? Is ther a soluton like manually manipulating the .pep.xml files to only contain PSMs of interest? We are not that familiar with PDViewer and are wondering if there is a quick and easy solution.

Thank you in advance for your help!

anesvi commented 1 year ago

Before we send more detailed response, could you please send us the log file from FragPipe, here or directly email to nesvi at Med.umich.edu Thanks Alexey

Get Outlook for iOShttps://aka.ms/o0ukef

From: candidaorelmex @.> Sent: Friday, January 27, 2023 5:23:25 AM To: Nesvilab/FragPipe @.> Cc: Subscribed @.***> Subject: [Nesvilab/FragPipe] Interpretation of different parameters from psm.tsv file for the assessment of identification quality (Issue #981)

External Email - Use Caution

Hi, I have a couple of question about the quality assessment of PSM that passed all FDR threshholds (1%, except protein at 100% - we are doing immunopeptidomics) but that we still want to inspect more in-depth.

Could you elaborate on the meaning of the "Expectation" column in the psm.tsv output, or are there any ressources beyond what is explained on the website ("expectation value from statistical modeling with PeptideProphet, lower values indicate higher likelihood"). Though the FDR filter was passed, does an Expectation of e.g. 0.5 make an ID much less reliable? How does this value relate to the "PeptideProphet Probability" (practically, in terms of deciding on the reliability of a mass spec ID)?
Is it possible that the Hyperscore of PSM identified in DIA files with DIA-Umpire are somewhow punished? When we compare the spectra of the same peptide in the Viewer from DDA and DIA files, even though they look quite the same, the hyperscores of DIA PSMs have a trend of being lower. We observe this for many peptides - is there some kind of DIA-Umpire-hyperscore punishment or is this just coincidence?
I tried to predict spectra for an analysis that included 300 raw files, incl. ~100 DIA raw files and ~200 DDA raw files. I let the prediction run for 24h and it still hasn't loaded, is the Viewer overwhelmed with this task, is our computer not powerfull enough (we have 40 cores and 256Gb of RAM...) or do I just need to wait for longer? Is ther a soluton like manually manipulating the .pep.xml files to only contain PSMs of interest? We are not that familiar with PDViewer and are wondering if there is a quick and easy solution.

Thank you in advance for your help!

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/981, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM674PP4UBNAFHDE652LWUOOZ3ANCNFSM6AAAAAAUIQZODU. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

candidaorelmex commented 1 year ago

I sent you an e-mail (subject: "FragPipe Github issue #981") with the log file, thank you for your help

anesvi commented 1 year ago

Dear Ilja

First, about your last question:

I tried to predict spectra for an analysis that included 300 raw files, incl. ~100 DIA raw files and ~200 DDA raw files. I let the prediction run for 24h and it still hasn't loaded, is the Viewer overwhelmed with this task, is our computer not powerfull enough (we have 40 cores and 256Gb of RAM...) or do I just need to wait for longer? Is ther a soluton like manually manipulating the .pep.xml files to only contain PSMs of interest? We are not that familiar with PDViewer and are wondering if there is a quick and easy solution. I looked at the log file. FragPipe finished successfully. It is a large dataset because for each raw file we generate 3 pseudo-MSMS spectral files (Q1,Q2,Q3). So I see DIA-NN did deep learning prediction for a lot of peptides. So I assume you question is about PDV Viewer not being able to load DAI-NN predicted spectra. Yes, it is possible that you got too many PSMs for PDV to handle. I see the following numbers:

INFO[02:40:50] Converged to 1.00 % FDR with 1539726 PSMs decoy=15398 threshold=0.880382 total=1555124 INFO[02:40:50] Converged to 1.00 % FDR with 28071 Peptides decoy=281 threshold=0.98512 total=28352 INFO[02:40:50] Converged to 1.00 % FDR with 40553 Ions decoy=405 threshold=0.980482 total=40958

We need to check with Kai if 1539726 PSMs is too many PSMs for PDV viewer to handle.

Clearly there are many redundant PSMs, since you go from 1539726 PSMs to only 40553 Ions. So yes we should think of ways to remove some of the redundant PSMs if the PDV viewer cannot handle it. As far as ways to reduce the number of PSMs, you can possibly restrict the analysis to Q1 pseudo-MS/MS spectra only. This should be fine given that you have DDA data. If using just direct DIA, using Q2 and Q3 may add some peptide IDs, but they are likely to be identified in the DDA data too. Restricting the analysis to Q1 only will cut the number of redundant PSMs.

We may also think about not printing all redundant PSMs as an option in FragPipe. We would need to discuss internally.

About the other questions:

Could you elaborate on the meaning of the "Expectation" column in the psm.tsv output, or are there any ressources beyond what is explained on the website ("expectation value from statistical modeling with PeptideProphet, lower values indicate higher likelihood")? Though the FDR filter was passed, does an Expectation of e.g. 0.5 make an ID much less reliable? How does this value relate to the "PeptideProphet Probability" (practically, in terms of deciding on the reliability of a mass spec ID)?
There is plenty of literature on what the expectation value is (see e.g. the MSFragger paper). What is shown in PeptideProphet Probability column is actually 1-Qvalue from Percolator (since Percolator was used instead of PeptideProphet). Percolator (or PeptideProphet for that matter) use multiple scores, not just the expectation value. So while 0.5 expectation value is not great (the lower the better), the PSM can pass the FDR filters based on the Percolator modeling that takes into account this and many other scores.
Is it possible that the Hyperscore of PSM identified in DIA files with DIA-Umpire are somewhow punished? When we compare the spectra of the same peptide in the Viewer from DDA and DIA files, even though they look quite the same, the hyperscores of DIA PSMs have a trend of being lower. We observe this for many peptides - is there some kind of DIA-Umpire-hyperscore punishment or is this just coincidence?

There is no intentional punishment. However, DIA-Umpire extracted MS/MS spectra tend to be noisier (more peaks). And MSFragger uses top N peaks per spectrum. So it is possibly not all matched y/b ions in see in the PDV viewer were used in scoring (to compute the hyperscore) of pseudo-MS/MS spectra from DIA-Umpire. So it is not surprising to see a DDA spectrum scoring higher that DIA spectrum for the same peptide.

Best, Alexey

candidaorelmex commented 1 year ago

Dear Alexey,

Thank you for the clarification!

Best, Ilja

Nesvilab / FragPipe

Interpretation of different parameters from psm.tsv file for the assessment of identification quality #981