Closed PatrickvanZalm closed 1 year ago
Hi Patrick,
It is a known issue. Can you try the msbooster-1.0.zip from https://github.com/Nesvilab/FragPipe/issues/533#issuecomment-973283953 ? You need to replace the one in the fragpipe/tools
folder.
Best,
Fengchao
Hi Fengchao,
Thanks for your prompt reply. I replaced use the msbooster as you suggested, which seemed to have done the trick. However, now I run into a new issue related to philosopher filter..
I've ran these samples before using peptideprophet instead of msbooster+percolator without issue.
Hi Patrick,
It looks like something was wrong with Philosopher. Can you try the latest Philosopher (4.1.1)? If the latest version still gives the error, can you try using PeptideProphet (but still with the latest version) since you said it is OK with PeptideProphet?
Best,
Fengchao
Sure, ill give updating a shot.
MSbooster & philosopher do not work together right?
MSBooster needs Percolator. MSBooster cannot work with PeptideProphet. Percolator or PeptideProphet can work alone without MSBooster.
It is not about Philosopher, actually. But sine your error is from Philosopher filter command, you need to upgrade your Philosopher in the "config" tab.
Best,
Fengchao
Thanks,
Ive just updated Philosopher to 4.1.1. but the issue persists.
Ill just use peptideprophet for now. I'd love to try out MSbooster+percolator so please let me know if you'd like me to run some tests.
Hi Patrick,
Can you confirm that PeptideProphet with the Philosopher 4.1.1 can work, but MSBooster+Percolator cannot?
What I cannot understand is why PeptideProphet or MSBooster+Percolator makes any difference for Philosopher filter command. They should be no difference from the filter point-of-view.
Best,
Fengchao
I just finished running with PeptideProphet without any issue.
I could try running percolator without MSBooster, that might point in the right direction? Let me know if youd like me to try.
Hi Patrick,
Can you send me the log from the one with PeptideProphet?
Thanks,
Fengchao
Thanks for your file. I have no clue now. Need to ask Felipe to take a look.
Hi Felipe @prvst ,
Can you take a look when you have time? Same data, one analysis with MSBooster+Percolator, and the other one with PeptideProphet. Philosopher filter command worked well with the PeptideProphet but failed with MSBooster+Percolator:
PhilosopherFilter [Work dir: D:\Patrick\JHU_DIA\SpecLibrary\JHU_PLASMAandCSF_PRO]
B:\fragpipe17\tools\philosopher\philosopher.exe filter --picked --prot 0.01 --tag rev_ --pepxml D:\Patrick\JHU_DIA\SpecLibrary\JHU_PLASMAandCSF_PRO --protxml D:\Patrick\JHU_DIA\SpecLibrary\JHU_PLASMAandCSF_PRO\combined.prot.xml --razor
INFO[20:33:28] Executing Filter v4.1.0
INFO[20:33:28] Processing peptide identification files
INFO[20:43:41] 1+ Charge profile decoy=31851 target=46148
INFO[20:43:41] 2+ Charge profile decoy=521085 target=1863338
INFO[20:43:41] 3+ Charge profile decoy=301623 target=916136
INFO[20:43:41] 4+ Charge profile decoy=164773 target=281902
INFO[20:43:42] 5+ Charge profile decoy=49536 target=57787
INFO[20:43:42] 6+ Charge profile decoy=0 target=0
INFO[20:43:59] Database search results ions=879891 peptides=718903 psms=4234179
INFO[20:44:23] Converged to 1.00 % FDR with 2001255 PSMs decoy=20202 threshold=0.743092 total=2.021457e+06
INFO[20:46:04] Converged to 1.00 % FDR with 15312 Peptides decoy=154 threshold=0.994341 total=15466
INFO[20:46:09] Converged to 1.00 % FDR with 21052 Ions decoy=212 threshold=0.991293 total=21264
INFO[20:46:11] Protein inference results decoy=5825 target=8317
panic: runtime error: index out of range [2723] with length 2723
goroutine 1 [running]:
philosopher/lib/fil.ProtXMLFilter(0x0, 0x0, 0xc000018110, 0x4, 0xc1750d8000, 0x3516, 0x3733, 0xc3595513c0, 0x1f, 0x3f847ae147ae147b, ...)
/workspace/philosopher/lib/fil/fdr.go:544 +0x1b32
philosopher/lib/fil.ProcessProteinIdentifications(0x0, 0x0, 0xc000018110, 0x4, 0xc1750d8000, 0x3516, 0x3733, 0xc3595513c0, 0x1f, 0x3f847ae147ae147b, ...)
/workspace/philosopher/lib/fil/fil.go:551 +0x345
philosopher/lib/fil.Run(0xc00001d650, 0x24, 0xc0001a2bc0, 0x33, 0xc000024960, 0x46, 0xc0000249b0, 0x42, 0xc0001a2c00, 0x39, ...)
/workspace/philosopher/lib/fil/fil.go:73 +0x2396
philosopher/cmd.glob..func5(0x868ec60, 0xc000202780, 0x0, 0xa)
/workspace/philosopher/cmd/filter.go:43 +0x49e
github.com/spf13/cobra.(*Command).execute(0x868ec60, 0xc000202640, 0xa, 0xa, 0x868ec60, 0xc000202640)
/home/prvst/go/pkg/mod/github.com/spf13/cobra@v0.0.6/command.go:844 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0x868da00, 0x3dbb01, 0x0, 0x0)
/home/prvst/go/pkg/mod/github.com/spf13/cobra@v0.0.6/command.go:945 +0x336
github.com/spf13/cobra.(*Command).Execute(...)
/home/prvst/go/pkg/mod/github.com/spf13/cobra@v0.0.6/command.go:885
philosopher/cmd.Execute()
/workspace/philosopher/cmd/root.go:35 +0x34
main.main()
/workspace/philosopher/main.go:22 +0x75
Process 'PhilosopherFilter' finished, exit code: 2
Process returned non-zero exit code, stopping
From my understanding, the output of MSBooster+Percolator and PeptideProphet should not affect the behavior of Philosopher filter command since they have the same file format. Since MSBooster+Percolator gave more IDs than PeptideProphet, it might triggered some bug in Philosopher.
Best.
Fengchao
It looks to be related to the protein XML files. Could you send me your pep.xml, prot.xml, and FASTA file, please?
Hi Felipe,
I've just (tried) to share the files through Google Drive. I hope you got the mail.
I just tried running without MSBooster, but with Percolator and this ran without any issue.
Hi Patrick,
Can you also share the Google Drive folder with me?
Thanks,
Fengchao
Sure. I believe I did add you too now.
I can just second to that problem; my semi-specific N-terminal searches seem to work fine if I have less files, but if I have a lot of files, I get the same index error issue. I will try with PeptideProphet, whether this might solve this problem.
Thanks Fatih for your information. Yes, you were having the same issue. Please let us know if using PeptideProphet can resolve this problem.
Best,
Fengchao
Are the results from the shared folder from MSBooster? I was able to go through the filtering part, I just couldn't finish because my 32GB RAM machine does not have the necessary amount of RAM to process everything.
INFO[13:55:07] 1+ Charge profile decoy=6194 target=18062
INFO[13:55:08] 2+ Charge profile decoy=56527 target=1301574
INFO[13:55:08] 3+ Charge profile decoy=24126 target=603174
INFO[13:55:08] 4+ Charge profile decoy=6859 target=114737
INFO[13:55:08] 5+ Charge profile decoy=867 target=6210
INFO[13:55:08] 6+ Charge profile decoy=0 target=0
INFO[13:55:16] Database search results ions=68866 peptides=60405 psms=2138330
INFO[13:55:26] Converged to 1.00 % FDR with 1755337 PSMs decoy=17719 threshold=0.8587 total=1.773056e+06
INFO[13:55:41] Converged to 0.99 % FDR with 12222 Peptides decoy=122 threshold=0.9972 total=12344
INFO[13:55:41] Converged to 0.99 % FDR with 16579 Ions decoy=165 threshold=0.9964 total=16744
INFO[13:55:42] Protein inference results decoy=3758 target=6489
INFO[13:55:42] Converged to 1.07 % FDR with 2343 Proteins decoy=25 threshold=0.998 total=2368
INFO[13:56:44] Applying sequential FDR estimation ions=20526 peptides=15196 psms=1745910
INFO[13:56:51] Converged to 0.30 % FDR with 1740638 PSMs decoy=5272 threshold=0.8587 total=1.74591e+06
INFO[13:57:09] Converged to 0.20 % FDR with 15165 Peptides decoy=31 threshold=0.8588 total=15196
INFO[13:57:09] Converged to 0.21 % FDR with 20482 Ions decoy=44 threshold=0.8588 total=20526
Killed
Thanks for pointing it out. I just tried and for me it also worked. I must have mixed up some folders.
Let me re-create the issue. Ill try to re-upload it and let you know.
Excuse me for the confusion.
I have just reuploaded a .zip to our Gsuite and invited the both of you.
Thank you, Patrick
I tested again with your new files, and the program is telling me that you don't have enough proteins with a good score:
INFO[10:24:46] Executing Filter v4.1.1
INFO[10:24:46] Processing peptide identification files
INFO[10:24:47] 1+ Charge profile decoy=137 target=193
INFO[10:24:47] 2+ Charge profile decoy=2451 target=9786
INFO[10:24:47] 3+ Charge profile decoy=1365 target=4373
INFO[10:24:47] 4+ Charge profile decoy=685 target=1122
INFO[10:24:47] 5+ Charge profile decoy=174 target=188
INFO[10:24:47] 6+ Charge profile decoy=0 target=0
INFO[10:24:47] Database search results ions=13222 peptides=12041 psms=20474
INFO[10:24:47] Converged to 1.00 % FDR with 10807 PSMs decoy=109 threshold=0.715347 total=10916
INFO[10:24:48] Converged to 0.98 % FDR with 3650 Peptides decoy=36 threshold=0.886969 total=3686
INFO[10:24:48] Converged to 1.00 % FDR with 4672 Ions decoy=47 threshold=0.856732 total=4719
INFO[10:24:49] Protein inference results decoy=5843 target=8334
ERRO[10:24:49] the protein FDR filter didn't reach the desired threshold, try a higher threshold using the --prot parameter
I set the protein FDR to 100% to see how far it goes, and this is the output:
INFO[10:25:07] Converged to 71.40 % FDR with 8107 Proteins decoy=5788 threshold=0.6993 total=138959
It appears that your protein scores are low, but I'm not sure what is the cause. @guoci @yangkl96, Could you take a look at his data set and see if you spot anything wrong?
Hi Felipe,
Thanks a ton for looking into this. That is indeed strange... It is the exact same files as the (wrong) files I did upload initially.
Ive also ran it with percolator, but without MSbooster. See screenshot below. There, the output seems fine to me. Could the issue be due to MSBooster?
Hi Patrick,
Can you send me your fasta fie?
Thanks,
Fengchao
Yes, that's why I'm asking @yangkl96 to take a look for us.
Hi Patrick,
Can you send me your fasta fie?
Thanks,
Fengchao
Ive uploaded it to Gdrive again and shared it with the both of you.
Hi Patrick,
I'll take a look now too and see if MSBooster is causing anything weird
Hi All,
I ran the filter command with --prot 0.02
and it finished without any error:
λ C:\Users\yufe\Desktop\bin\philosopher_v4.1.1_windows_amd64\philosopher.exe filter --picked --prot 0.02 --tag rev_ --pepxml ./ --protxml combined.prot.xml --razor
INFO[11:36:07] Executing Filter v4.1.1
INFO[11:36:08] Processing peptide identification files
INFO[11:43:48] 1+ Charge profile decoy=31988 target=46328
INFO[11:43:48] 2+ Charge profile decoy=524019 target=1874617
INFO[11:43:49] 3+ Charge profile decoy=303136 target=921208
INFO[11:43:49] 4+ Charge profile decoy=165572 target=283229
INFO[11:43:49] 5+ Charge profile decoy=49751 target=58022
INFO[11:43:49] 6+ Charge profile decoy=0 target=0
INFO[11:44:11] Database search results ions=882946 peptides=721053 psms=4257870
INFO[11:44:29] Converged to 1.00 % FDR with 2013555 PSMs decoy=20326 threshold=0.7431 total=2.033881e+06
INFO[11:45:02] Converged to 1.00 % FDR with 15317 Peptides decoy=154 threshold=0.994341 total=15471
INFO[11:45:04] Converged to 1.00 % FDR with 21056 Ions decoy=212 threshold=0.991293 total=21268
INFO[11:45:06] Protein inference results decoy=5843 target=8334
INFO[11:45:07] Converged to 2.06 % FDR with 2722 Proteins decoy=56 threshold=0.9982 total=2778
INFO[11:46:17] 2D FDR estimation: Protein mirror image decoy=2722 target=2722
INFO[11:46:36] Second filtering results ions=153103 peptides=122092 psms=2415189
INFO[11:46:47] Converged to 1.00 % FDR with 2053592 PSMs decoy=20731 threshold=0.153119 total=2.074323e+06
INFO[11:46:47] Converged to 1.00 % FDR with 16203 Peptides decoy=163 threshold=0.954171 total=16366
INFO[11:46:48] Converged to 1.00 % FDR with 22094 Ions decoy=223 threshold=0.941882 total=22317
INFO[11:48:02] Post processing identifications
INFO[11:49:06] Assigning protein identifications to layers
INFO[11:49:32] Processing protein inference
INFO[11:49:43] Synchronizing PSMs and proteins
INFO[11:50:03] Total report numbers after FDR filtering, and post-processing ions=22092 peptides=16202 proteins=2722 psms=2053439
INFO[11:50:03] Saving
INFO[11:51:07] Done
Please note that there are only 56 decoy proteins with 2% protein FDR (and with MSBooster). According to Patrick's screenshot, 1% protein FDR without MSBooster has 30 decoy proteins. Then, I have a guess: maybe there is no decoy proteins with 1% protein FDR and with MSBooster. Felipe @prvst , is there a way to see the list of protein target and decoys after filtering with 1% protein FDR?
Thanks,
Fengchao
Try setting the thresholds to 100%, and allow decoys to be reported, the tables should have all IDs
Hi Patrick,
Could you share with me just one of your uncalibrated mgf files? JHU_AD_PLASMA_PERCA_PRO_DDA_81_S4-G5_1_11706 should be good
Try setting the thresholds to 100%, and allow decoys to be reported, the tables should have all IDs
Hi Felipe @prvst , it does not seem right. The log says that INFO[11:45:07] Converged to 2.06 % FDR with 2722 Proteins decoy=56 threshold=0.9982 total=2778
, but there are only 7 decoys in the proteins.tsv:
Also, the top peptide probability
is truncated to four decimal points, which makes it impossible to filter the protein with 1% FDR using protein.tsv
.
Best,
Fengchao
Also, there are in total 2722 target proteins and 7 decoy proteins. Thus, the actual FDR is 0.26%, not 2%. I am using 2D filter, but I don't think it matters to the protein FDR filtering.
Best,
Fengchao
I think the remaining ones are being filtered out, I'll see if I can output the calculations in debug mode.
Hi Patrick,
Could you share with me just one of your uncalibrated mgf files? JHU_AD_PLASMA_PERCA_PRO_DDA_81_S4-G5_1_11706 should be good
Sure. What email can I link the file on Gdrive to?
I think the remaining ones are being filtered out, I'll see if I can output the calculations in debug mode.
What filters it out? I though FDR is the only filter.
PSMs and proteins need to be in synch. Since they are filtered a part, we take “orphan” entries out if they don't have supporting evidence.
Hi Patrick, Could you share with me just one of your uncalibrated mgf files? JHU_AD_PLASMA_PERCA_PRO_DDA_81_S4-G5_1_11706 should be good
Sure. What email can I link the file on Gdrive to?
yangkl@umich.edu
I did just share the uncalibrated mgf with the three of you.
Hi Patrick,
I tested out MSBooster+Percolator on just that one file and the run finished successfully. Do you happen to have the ID numbers before the protein inference step in your log files (before the crash)? I'd like to confirm that you're seeing more PSMs/peptides before the protein inference step with MSBooster than without it.
Would you also mind running MSBooster with just the spectra features box checked (uncheck RT features box) or vice versa? It would be interesting if protein-level FDR filtering worked with just a subset of features.
Thanks, Kevin
Hi Kevin,
Im not really sure what number you are looking for. Ive attached two logs; one with MSbooster that did crash. One without MSbooster. log_MSbooster_withCrash.txt log_noMSbooster.txt
I will run MSbooster but without either of the two features right now.
Cheers, Patrick
Hi,
Ive tried with either one of the two features on. Having only predict Spectra on runs fine. Having only predict RT on creates errors.
Hope this helps. Best, Patrick
This was really helpful, Patrick. I am looking at some of the RT plots in the RTplots folder you shared and notice some horizontal streaking, where PSMs of the same peptide are eluting much later than the predicted RT.
@kabalak , do your RT plots look like this? And are you able to get MSBooster+Percolator to run with just spectra features, but have the same issue when just using RT features?
OK, I thought I will never read the Philosopher source code, but I was wrong. I just cannot help to figuring out why adding one more score will crash the FDR filtering: the filtering is suppose to take whatever input, sort them, and then calculate the decoy / target ratios. If the score is bad, the filtering command just give few proteins, but it should not crash.
Then, I start to read ProtXMLFilter
function in fdr.go
file based on the clue in the error message:
INFO[20:46:11] Protein inference results decoy=5825 target=8317
panic: runtime error: index out of range [2723] with length 2723
goroutine 1 [running]:
philosopher/lib/fil.ProtXMLFilter(0x0, 0x0, 0xc000018110, 0x4, 0xc1750d8000, 0x3516, 0x3733, 0xc3595513c0, 0x1f, 0x3f847ae147ae147b, ...)
/workspace/philosopher/lib/fil/fdr.go:544 +0x1b32
philosopher/lib/fil.ProcessProteinIdentifications(0x0, 0x0, 0xc000018110, 0x4, 0xc1750d8000, 0x3516, 0x3733, 0xc3595513c0, 0x1f, 0x3f847ae147ae147b, ...)
/workspace/philosopher/lib/fil/fil.go:551 +0x345
philosopher/lib/fil.Run(0xc00001d650, 0x24, 0xc0001a2bc0, 0x33, 0xc000024960, 0x46, 0xc0000249b0, 0x42, 0xc0001a2c00, 0x39, ...)
/workspace/philosopher/lib/fil/fil.go:73 +0x2396
philosopher/cmd.glob..func5(0x868ec60, 0xc000202780, 0x0, 0xa)
/workspace/philosopher/cmd/filter.go:43 +0x49e
github.com/spf13/cobra.(*Command).execute(0x868ec60, 0xc000202640, 0xa, 0xa, 0x868ec60, 0xc000202640)
/home/prvst/go/pkg/mod/github.com/spf13/cobra@v0.0.6/command.go:844 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0x868da00, 0x3dbb01, 0x0, 0x0)
/home/prvst/go/pkg/mod/github.com/spf13/cobra@v0.0.6/command.go:945 +0x336
github.com/spf13/cobra.(*Command).Execute(...)
/home/prvst/go/pkg/mod/github.com/spf13/cobra@v0.0.6/command.go:885
philosopher/cmd.Execute()
/workspace/philosopher/cmd/root.go:35 +0x34
main.main()
/workspace/philosopher/main.go:22 +0x75
Process 'PhilosopherFilter' finished, exit code: 2
Process returned non-zero exit code, stopping
I think I might find the cause. Felipe @prvst please feel free to correct me if I am wrong. It is my first time reading the GO programming language without running any debugging code, so my understanding is very like to be wrong.
In this for-loop starting from https://github.com/Nesvilab/philosopher/blob/03b93d86a988eca00b447d094d7f87daba2c9d49/lib/fil/fdr.go#L547
if curScore < targetFDR && fmtScore != targetFDR && probArray[len(probArray)-1] != curProb {
for i := 0; i <= len(probArray); i++ {
if probArray[i] == curProb {
probList[probArray[i+1]] = 0
minProb = probArray[i+1]
calcFDR = scoreMap[probArray[i+1]]
// if probArray[i+1] < curProb {
// curProb = probArray[i+1]
// }
// if scoreMap[probArray[i+1]] > curScore {
// curScore = scoreMap[probArray[i+1]]
// }
break
}
}
}
i
is in [0, len(probArray)]
, but the program takes probArray[i+1]
. It means that i + 1
will be out of the probArray
's range (which is from 0
to len(probArray) - 1
) when i == len(probArray) - 1
or i == len(probArray)
, which cause the exact crash we see in the error message: panic: runtime error: index out of range [2723] with length 2723
. We did not see this crash many times because several conditions need to be met to reach that line.
Even that is not the reason of the crash, that for-loop is still need to fixed if what I understood is correct.
Best.
Fengchao
We know where the break happens because it has occurred before in rare occasions. We've seen a few rare cases like this since the first implementation, and they're usually related to very sparse data sets, or data with poor scoring. As you correctly pointed out, the reason is something else, and it might be related to the scoring.
As you correctly pointed out, the reason is something else, and it might be related to the scoring.
So, am I correct or incorrect? If I am correct, the reason is not something else, but the bug I pointed out. If I am not correct, then, could you please debug the ProtXMLFilter
function to see which line throws the error message? I think you are the one who knows the best about how to compile and debug your program. It would take me a long time to figure out how to debug since I have never used GO.
Best,
Fengchao
Even that is not the reason of the crash, that for-loop is still need to fixed if what I understood is correct.
I'm talking about the reason of the crash. If the filter cannot reach 1% FDR with the boost, but it can without, using the same data, then I think we need to understand what is changing in the groups, and with the scores. Regarding the function, any improvements are welcome, you can open a PR or a ticket, and I'll work on it.
I'm talking about the reason of the crash. If the filter cannot reach 1% FDR with the boost, but it can without, using the same data, then I think we need to understand what is changing in the groups, and with the scores.
If "cannot reach 1% FDR" means there is no protein after filtering, it should not crash Philosopher. Philosopher should report 0 proteins if there is no protein passing 1% FDR. I also cannot understand why changing the score is the reason of the crash. We change parameters all the time and we did not see crashes all the time.
Regarding the function, any improvements are welcome, you can open a PR or a ticket, and I'll work on it.
I am sorry that although I know those two lines have bugs, I don't know how to fix them because I never wrote GO. I think it should be easy for you to fix since you already know the location. I am just trying to help to solve the issue. If you think those two lines are not the cause of the error, can you run your program in the debug mode and tell use which line throws the error?
Thanks,
Fengchao
I'll add a handle to the loop, that should prevent the crash, but it will return a 0, and we'll probably want to halt the filter.
Thank you very much!
Can you also add a warning message? I think "add a handle" and printing a message are just for debugging the problem, not the final solution. We need to come up some logic to fix it after locating the exact reason.
@yangkl96 , after Felipe send you the Philosopher with the debug info, can you run the dataset again to see if the crashing is gone?
Thanks,
Fengchao
Meanwhile, @yangkl96 we should look at both the data files as well.
Hi Team Fragpipe,
I've recently updated to Fragpipe V17.0, where I try to use "DIA_Speclib_Quant" workflow. in this case, I just want the library.tsv file.
There seems to be some error with the percolator .pin files? I'm not completly sure.
I've added the log file. Hopefully it can be clarify my issue. log_2021-12-01_14-13-15.txt
With kind regards, Patrick