Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
205 stars 38 forks source link

There are high probability peptides being filtered out after protein assignment. #570

Closed weixiandeng closed 1 year ago

weixiandeng commented 2 years ago

Hi Fengchao,

Wish you are having a great holiday!

So, I've been trying to import interact -.xml files generated from MSFragger-DIA + DDA(DIA_Speclib_Quant workflow) into Skyline and to generate a spectral library for peptide-centric search. I took the converged 1% ion level FDR threshold and put in the cut-off score box in Skyline. With this, I got a much larger number of peptides ~63K spectral library comparing to 29K from MSFragger results.

This discrepancy seems only true for importing interact-.xml files from DIA_Speclib_Quant workflow, and other cases like DDA search results seems consistent between MSFragger and Skyline while building spec lib in Skyline.

Looking forward to hearing your opinion on it.

Best, Weixian

fcyu commented 2 years ago

Hi Weixian,

How did you import DIA interact-*.pep.xml files to Skyline? As far as I know, Skyline does not support FragPipe's DIA interact files due to the _rankX suffix.

BTW, have you imported protein.fas to Skyline too?

Best,

Fengchao

weixiandeng commented 2 years ago

I used Skyline-Daily, maybe that's why it allowed me to import? And about two weeks ago, I probably used PeptideProhpet result which is the combined.pep.xml and imported in the Skyline stable version, and I probably ran into the same problem, I cannot remember the detail, so I'm not certain about it.

I did import protein.fas, but since I was doing peptide level quantifications, I put --prot 1 in the FDR filter.

fcyu commented 2 years ago

I am using the latest Skyline-daily too. But still get the same error as the one from long time ago:

---------------------------
Skyline-daily
---------------------------
ERROR: No spectra were found for the new library.

Command-line: C:\Users\yufe\AppData\Local\Apps\2.0\181WWENP.RYH\PWGEADGY.2C2\skyl..tion_e4141a2a22107248_0015.0001_9a4bb3d7cf8a899e\BlibBuild -s -A -H -o -c 0.95 -i 1 -S "C:\Users\yufe\AppData\Local\Temp\tmp9D35.tmp" "F:\msfraggerdia\1.redundant.blib"
Working directory: F:\msfraggerdia
---------------------------
OK More Info
---------------------------
System.IO.IOException: ERROR: No spectra were found for the new library.

Command-line: C:\Users\yufe\AppData\Local\Apps\2.0\181WWENP.RYH\PWGEADGY.2C2\skyl..tion_e4141a2a22107248_0015.0001_9a4bb3d7cf8a899e\BlibBuild -s -A -H -o -c 0.95 -i 1 -S "C:\Users\yufe\AppData\Local\Temp\tmp9D35.tmp" "F:\msfraggerdia\1.redundant.blib"
Working directory: F:\msfraggerdia
   at pwiz.Common.SystemUtil.ProcessRunner.Run(ProcessStartInfo psi, String stdin, IProgressMonitor progress, IProgressStatus& status, TextWriter writer, ProcessPriorityClass priorityClass) in C:\proj\skyline_21_2_x64\pwiz_tools\Shared\Common\SystemUtil\ProcessRunner.cs:line 149
   at pwiz.BiblioSpec.BlibBuild.BuildLibrary(LibraryBuildAction libraryBuildAction, IProgressMonitor progressMonitor, IProgressStatus& status, String& commandArgs, String& messageLog, String[]& ambiguous) in C:\proj\skyline_21_2_x64\pwiz_tools\Shared\BiblioSpec\BlibBuild.cs:line 201
   at pwiz.Skyline.Model.Lib.BiblioSpecLiteBuilder.BuildLibrary(IProgressMonitor progress) in C:\proj\skyline_21_2_x64\pwiz_tools\Skyline\Model\Lib\BiblioSpecLiteBuilder.cs:line 157
---------------------------

I probably used PeptideProhpet result which is the combined.pep.xml and imported in the Skyline stable version

Using PeptideProphet's combined.pep.xml would result in fewer and also incorrect peptides because pep.xml file cannot list different charge states for different ranks from the same scan. We address this issue by printing different ranks to different pep.xml files.

Could you please use Percolator and try again? If the discrepancy is still there, could you send us the log file?

Best,

Fengchao

weixiandeng commented 2 years ago

log_2021-12-29_12-43-36.txt

I did use Percolator for this Skyline-Daily case, and tried importing interact-.pep.xml files again, there's no problem for the importing, but the huge peptide number discrepancy persists.

fcyu commented 2 years ago

OK, it looks like when using raw files in FragPipe, the DIA interact-*.pep.xml can be imported to Skyline because Skyline looks for _uncalibrated.mgf files. While using mzML files, there are errors from Skyline because there is no _uncalibrated.mgf. Skyline is supposed to uze the mzML files. I will submit a ticked to Skyline support forum.

I also reproduced this issue with two DIA runs in my computer. I think there are bugs in both FragPipe speclib module and Skyline. Let me try to explain it as clearly as I can.

EasyPQP:

  1. There are many peptides with high probability but not in FragPipe's library. Those peptides are also in protein.fas, which means that they pass the protein FDR. I will put the link to the files at the end of this reply, @guoci can you take a look?

Skyline:

  1. There are peptides with probabilities lower than the threshold but still in Skyline's library. I will submit a ticket to Skyline.

Here (https://www.dropbox.com/s/5dbe2q132fplppz/Book2.xlsx?dl=0) is the Excel file listing the discrepant peptides with the probabilities. You can find the peptides with high probability but not in FragPipe's library. There are also peptides with low probability but in Skyline's library.

Here (https://www.dropbox.com/sh/qc3z8ive4vh5wt7/AABnlEQvqNjysOG2bixl-2wwa?dl=0) has all files from FragPipe and Skyline.

Happy new year,

Fengchao

fcyu commented 2 years ago

After exchanging some conversation with Nick here (https://skyline.ms/announcements/home/support/thread.view?rowId=54110), most puzzles have been solved.

First of all, Skyline looks for peptideprophet_summary to decide if using probability as the threshold. Since that tag was missing in the interact.pep.xml file converted from the pin file, Skyline "rolls back" to use expect. After adding the peptideprophet_summary, Skyline works as expected. Here (https://www.dropbox.com/s/fn8qwgr3304jw9a/FragPipe-17.2-build8.zip?dl=1) is the pre-release having the fix.

FragPipe filters out peptides that are not in peptide.tsv in generating the spectral library. The FDR threshold in peptide.tsv is more stringent that that in ion.tsv. This difference contribute parts of the discrepancy between Skyline's and FragPipe's libraries. Should we update the tutorial here (https://fragpipe.nesvilab.org/docs/tutorial_skyline.html) to use the peptide's threshold?

Comparing Skyline's library against ion.tsv, there are still some peptides missing in ion.tsv. I checked some of them, and they were not in protein.fas. It looks like they were filtered out by the protein FDR. Some of the peptides have really good probabilities (e.g. 1 or 0.9999). I cannot understand why they did not pass the protein FDR filtering. I am using the fasta file download by Philosopher, so the database format should not be an issue. Following is the screenshot of some of those peptides. The score is the probability.

image

I put the Excel file with the peptide list here (RefSpectra.zip).

Best,

Fengchao

weixiandeng commented 2 years ago

Thanks for sorting this out!

This is very helpful!