Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
208 stars 38 forks source link

Large mass error resulted in low IDs #1306

Closed babsillo closed 1 year ago

babsillo commented 1 year ago

image image

These are 2 images of 2 different helaQC files...same maschine settings (qtof)...but just the calibration part. i am pretty sure - here somehow is my problem.

The upper one has issues. just about 1500 proteins Identified here. the lower one is fine with around 4000 proteins - what it should look like.

What exactly does this calibration table tell me?

  1. my ms1 data is around 15 ppm off? but my ms2 is not? right?
  2. with the lower image: 10ppm fragment mass tolerance gives me 24000 ms2(approx)
  3. upper image: 10 ppm only 1900 ms2? why, ? should also have around 2400
  4. do I see in the cal. part any problems?

with the vendor software both helas are fine with approx. the same protein/peptide IDs. I am just not getting the MSfragger difference here. or the dataproblem which I am having. 15ppm off in ms1 is a bit much I would say. (I am also writing back and forth with the vendor because of that issue)

I want to get a feeling and better understanding in outputfiles --> to be faster and better in troubleshooting. The data were aquired with all settings the same - just few days - in between. also msconvert on same pc. same version. true for msfragger versions.

both logfiles are attached. If you can give me any hits. or tutiorials describing the output in more detail. I would love that. Thanks in advance for helping Barbara

-

anesvi commented 1 year ago

I looked at the log and cannot tell why. I suspect something else is wrong with that file. You may need to share the two raw files with us, along with the output from "other software" that you are saying gives same results for both files

babsillo commented 1 year ago

Hi,

I am sorry for my late reply. I now did zip compression on the bruker.d files (I am measuring on a bruker qtof) I provide the mzml which I used for MSfragger

And the mzID as a searchresult from the ProteinScape software – I use PS just for protein ID. I search with mascot – and I don’t know exactely how I should send the results. As the files are really big – I want to share them via our medunibox. Can you provide an email – which I can use to share the files?

Please tell me if it worked – as I am not sure about it. I know that my MS1 data is quite uncalibrated at the moment. Around 14-15 ppm. Does this affect the mass calibration of the MSfragger? In proteinscape and MSfragger I tried to use the same parameter – at least as similar as possible. In the meanwhile I try to work on my maschine – and probably find out why the mass is so off.

Thanks in advance Barbara

Von: Alexey Nesvizhskii @.> Gesendet: Montag, 23. Oktober 2023 20:24 An: Nesvilab/FragPipe @.> Cc: Darnhofer, Barbara @.>; Author @.> Betreff: [EXT] Re: [Nesvilab/FragPipe] data issue vs. msfragger issue (Issue #1306)

I looked at the log and cannot tell why. I suspect something else is wrong with that file. You may need to share the two raw files with us, along with the output from "other software" that you are saying gives same results for both files

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/1306#issuecomment-1775768091, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCKTSXZ2KZJUO4J6GW5JXNDYA2Y2FAVCNFSM6AAAAAA6MNBJVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZVG43DQMBZGE. You are receiving this because you authored the thread.Message ID: @.***>

fcyu commented 1 year ago

Hi Barbara,

Could you upload the files, including the workflow file, .d folder, fasta file, result files, and the other software's results to https://www.dropbox.com/request/Xv3TrAYciOSpLLkhdw8R . We will investigate after receiving the files.

Thanks,

Fengchao

fcyu commented 1 year ago

Thanks for your files. It seems that there was no decoy in your Mascot search. May I ask how did you filter the results to get 1% FDR?

Thanks,

Fengchao

babsillo commented 1 year ago

I do decoy search, they are not directly in the database. I check them in a box, but mascot then sends the results to ProteinScape.

@.***

These are my searchsettings. Just the results from PS are not easy to provide. I gave you the mascot output directly. Or do you prefer other lists like excel?

Best barbara

Von: Fengchao @.> Gesendet: Montag, 30. Oktober 2023 19:12 An: Nesvilab/FragPipe @.> Cc: Darnhofer, Barbara @.>; Author @.> Betreff: [EXT] Re: [Nesvilab/FragPipe] data issue vs. msfragger issue (Issue #1306)

Thanks for your files. It seems that there was no decoy in your Mascot search. May I ask how did you filter the results to get 1% FDR?

Thanks,

Fengchao

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/1306#issuecomment-1785786773, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCKTSX4NRZTKCYG2QJJXP4LYB7UWVAVCNFSM6AAAAAA6MNBJVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBVG44DMNZXGM. You are receiving this because you authored the thread.Message ID: @.***>

fcyu commented 1 year ago

Hi Barbara,

The mzID file (the mascot output) doesn't have decoys. I am not sure if the results have been filtered. Could you send the FDR filtered csv/tsv/Excel files?

Thanks,

Fengchao

babsillo commented 1 year ago

HI,

Attached is an excel with the ProteinScape output from our mascot search. The data is for sure decoy searched – Mascot generates hisself what he needs. Final FDR in case of the 1382: 0.80% @.***

For the MSfragger, I loaded up the fasta. And let msfragger generate the database with contaminants and decoys. I am still thinking - due to my uncalibrated data (around 15ppm off) – Msfragger has problem recalibrating them? Do you think this can be a reason?

ProteinScape uses the internal recalibrated .mgf files generated from the bruker maXis. Do you also want the have them?

Best Barbara

Von: Fengchao @.> Gesendet: Dienstag, 31. Oktober 2023 14:38 An: Nesvilab/FragPipe @.> Cc: Darnhofer, Barbara @.>; Author @.> Betreff: [EXT] Re: [Nesvilab/FragPipe] data issue vs. msfragger issue (Issue #1306)

Hi Barbara,

The mzID file (the mascot output) doesn't have decoys. I am not sure if the results have been filtered. Could you send the FDR filtered csv/tsv/Excel files?

Thanks,

Fengchao

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/1306#issuecomment-1787237484, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCKTSX2QZNGAVUGSG7EYQO3YCD5LPAVCNFSM6AAAAAA6MNBJVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBXGIZTONBYGQ. You are receiving this because you authored the thread.Message ID: @.***>

fcyu commented 1 year ago

Hi Barbara,

I didn't receive your files. Maybe GitHub truncated it. But after I enlarged the precursor tolerance and precursor true tolerance to 50 ppm, I got reasonable number of IDs: image image

INFO[12:27:49] Converged to 1.00 % FDR with 37722 PSMs       decoy=377 threshold=0.656577 total=38099
INFO[12:27:49] Converged to 1.00 % FDR with 23700 Peptides   decoy=237 threshold=0.786709 total=23937
INFO[12:27:49] Converged to 1.00 % FDR with 26960 Ions       decoy=269 threshold=0.748066 total=27229
INFO[12:27:49] Protein inference results                     decoy=515 target=5428
INFO[12:27:49] Converged to 0.98 % FDR with 4580 Proteins    decoy=45 threshold=0.9657 total=4625
INFO[12:27:50] Applying sequential FDR estimation            ions=26818 peptides=23619 psms=37423
INFO[12:27:50] Converged to 0.15 % FDR with 37368 PSMs       decoy=55 threshold=0.656577 total=37423
INFO[12:27:50] Converged to 0.20 % FDR with 23573 Peptides   decoy=46 threshold=0.656577 total=23619
INFO[12:27:50] Converged to 0.17 % FDR with 26772 Ions       decoy=46 threshold=0.656577 total=26818

And the mass calibration also showed that the error was ~25 ppm. I guess that was why 10 and 20 ppm didn't give you good IDs:

*********************MASS CALIBRATION AND PARAMETER OPTIMIZATION*******************
-----|---------------|---------------|---------------|---------------
     |  MS1   (Old)  |  MS1   (New)  |  MS2   (Old)  |  MS2   (New)  
-----|---------------|---------------|---------------|---------------
 Run |  Median  MAD  |  Median  MAD  |  Median  MAD  |  Median  MAD  
 001 |  23.81   1.45 |  -3.40   4.91 |   0.26   4.70 |   0.02   4.71  
-----|---------------|---------------|---------------|---------------
Finding the optimal parameters:
-------|-------|-------|-------|-------|-------|-------
  MS2  |    7  |   10  |   15  |   20  |   25  |   30  
-------|-------|-------|-------|-------|-------|-------
 Count | skip  |  25254|  25068| skip rest
-------|-------|-------|-------|-------|-------|-------
-------|-------|-------|-------|-------
 Peaks | 300_0 | 200_0 | 150_1 | 100_1 
-------|-------|-------|-------|-------
 Count |  25482|  25730|  25254| skip rest
-------|-------|-------|-------|-------
-------|-------
 Int.  |    1  
-------|-------
 Count |  24024
-------|-------
-------|-------
 Rm P. |    0  
-------|-------
 Count |  25635
-------|-------

Best,

Fengchao

babsillo commented 1 year ago

Hi,

Thanks a lot for your help.

I will try to reproduce this IDs.

But - hell no - my data is quite far away from beeing accurate.

It is nice that I can still work with MSfragger - but

It would be better to have better calibrated data, right?

And just open the ppm windows that large should be more an emergency rescue??

am I right?

In the meanwhile I did an MSfraggersearch with the .mgf recalibrated (from bruker).

And I got nice results with std. settings 10ppm.

so .mgf are recalibrated but I cannot do LFQ.

the .mzML are for LFQ but not recalibrated.

So, am I right. That:

  1. try to get a better calib. maschine
  2. if not possible (whyever) try to get the .mzmL also internal recalibrated for LFQ (I will contact the vendor)
  3. if 2. also not possible then try to open the mass window really...reaaalllly wide. so I have enough IDs. and LFQ.
  4. Am I right - and has a very wide mass window any bad influence on the outcome? Any negative aspect? (besides really beeing uncalibrated).

Thanks again for your help - I really didnt know we are that off.

Best Barbara


Von: Fengchao @.***> Gesendet: Dienstag, 31. Oktober 2023 17:47:18 An: Nesvilab/FragPipe Cc: Darnhofer, Barbara; Author Betreff: [EXT] Re: [Nesvilab/FragPipe] data issue vs. msfragger issue (Issue #1306)

Hi Barbara,

I didn't receive your files. Maybe GitHub truncated it. But after I enlarged the precursor tolerance and precursor true tolerance to 50 ppm, I got reasonable number of IDs: [image]https://user-images.githubusercontent.com/6926299/279452164-69eb8d5b-44ca-40f4-8649-8953efa64a0f.png [image]https://user-images.githubusercontent.com/6926299/279452208-3f7063ea-061d-479a-b93c-30d3ea2bb0a4.png

INFO[12:27:49] Converged to 1.00 % FDR with 37722 PSMs decoy=377 threshold=0.656577 total=38099 INFO[12:27:49] Converged to 1.00 % FDR with 23700 Peptides decoy=237 threshold=0.786709 total=23937 INFO[12:27:49] Converged to 1.00 % FDR with 26960 Ions decoy=269 threshold=0.748066 total=27229 INFO[12:27:49] Protein inference results decoy=515 target=5428 INFO[12:27:49] Converged to 0.98 % FDR with 4580 Proteins decoy=45 threshold=0.9657 total=4625 INFO[12:27:50] Applying sequential FDR estimation ions=26818 peptides=23619 psms=37423 INFO[12:27:50] Converged to 0.15 % FDR with 37368 PSMs decoy=55 threshold=0.656577 total=37423 INFO[12:27:50] Converged to 0.20 % FDR with 23573 Peptides decoy=46 threshold=0.656577 total=23619 INFO[12:27:50] Converged to 0.17 % FDR with 26772 Ions decoy=46 threshold=0.656577 total=26818

And the mass calibration also showed that the error was ~25 ppm. I guess that was why 10 and 20 ppm didn't give you good IDs:

**MASS CALIBRATION AND PARAMETER OPTIMIZATION ----- --------------- --------------- --------------- --------------- MS1 (Old) MS1 (New) MS2 (Old) MS2 (New)
Run Median MAD Median MAD Median MAD Median MAD
001 23.81 1.45 -3.40 4.91 0.26 4.70 0.02 4.71
----- --------------- --------------- --------------- ---------------
Finding the optimal parameters: ------- ------- ------- ------- ------- ------- ------- MS2 7 10 15 20 25 30
Count skip 25254 25068 skip rest
------- ------- ------- ------- ------- ------- -------
------- ------- ------- ------- -------
Peaks 300_0 200_0 150_1 100_1
------- ------- ------- ------- -------
Count 25482 25730 25254 skip rest
------- ------- ------- ------- -------
------- -------
Int. 1
------- -------
Count 24024
------- -------
------- -------
Rm P. 0
------- -------
Count 25635
------- -------

Best,

Fengchao

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/1306#issuecomment-1787598718, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCKTSXYGHTFCVFDVDJG6WGLYCETRNAVCNFSM6AAAAAA6MNBJVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBXGU4TQNZRHA. You are receiving this because you authored the thread.Message ID: @.***>

fcyu commented 1 year ago

Hi Barbara,

Yeah, you can set both the precursor mass tolerance and the precursor true mass tolerance to 50 ppm to resolve this issue. As far as I know, it won't affect the result too much because MSFragger is quite robust (we used 50 ppm as the default setting before). You can also use the calibrated MGF file from the vendor, but (as you also know) you can't perform label-free quantification.

Best,

Fengchao

babsillo commented 1 year ago

Hi,

I tried to rerun the 1379.mzml hela with following conditions

[cid:3f9851b0-0ec8-4f06-8502-1e475ff5c254]

but it didnt work out, as in your case...

[cid:331cb251-3404-4de2-b21c-e0fb0966e832] What do I miss? Best Barbara


Von: Fengchao @.***> Gesendet: Dienstag, 31. Oktober 2023 18:39 An: Nesvilab/FragPipe Cc: Darnhofer, Barbara; Author Betreff: [EXT] Re: [Nesvilab/FragPipe] Large mass error resulted in low IDs (Issue #1306)

Hi Barbara,

Yeah, you can set both the precursor mass tolerance and the precursor true mass tolerance to 50 ppm to resolve this issue. As far as I know, it won't affect the result too much because MSFragger is quite robust (we used 50 ppm as the default setting before). You can also use the calibrated MGF file from the vendor, but (as you also know) you can't perform label-free quantification.

Best,

Fengchao

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/1306#issuecomment-1787683383, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCKTSX3SOYO2PWWXUIISQS3YCEZV7AVCNFSM6AAAAAA6MNBJVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBXGY4DGMZYGM. You are receiving this because you authored the thread.Message ID: @.***>

fcyu commented 1 year ago

I didn't see anything in your previous message. GitHub often truncates the emails. You'd better go to https://github.com/Nesvilab/FragPipe/issues/1306 and reply the message there.

Also, could you share your log file?

Thanks,

Fengchao

babsillo commented 1 year ago

Sorry, I directly replied to the mail - and didnt notice that the images got lost. I am sorry for that.

I tried to reproduce your results... did you worked with the mzml or directly with the .d folder? ...with the 1379 hela. .. I took the 1379.mzml 50ppm precursor and 50 ppm fragment mass tolerance.

image

but I got different calibration results...with low ids.

image

So, do I miss something else? I want to use msfragger for LFQ - so I really hope to get this right. Thanks Barbara

fcyu commented 1 year ago

You should also change the precursor true tolerance at the bottom of the MSFragger tab:

image

Best,

Fengchao

babsillo commented 1 year ago

It is clearly an advantage if somebody can properly read. (I really feel sorry) With 50 ppm precursor, 50ppm fragment, 20ppm true (because of non reading abilities) I still got 3990 proteins: image

image

with your 50ppm true tolerance: I got 3936 proteins - nearly as much as in first search: my results are not the same as yours - but more similar. image image

Now the difference is quite low and I am really happy I can work with my data, lets say nearly properly. I attached the logfile from the 2nd search... 50 ppm all tolerances. Do you by any chance see the last (hopefully last) parameter not similar to your search? So I can work as nice as possible? Is one of these strategies better than the other? I mean 50/50/50 ppm versus 50/50/still20ppm for true fragment tolerance? Thanks again for your help and your patience. And I am really sorry for not understanding some things in the first place.

Best barbara

log file: 50/50/50 ppm log_2023-10-31_21-31-58.txt

logfile: 50/50/ still 20 true fragment tolerance log_2023-10-31_19-22-12.txt

fcyu commented 1 year ago

Hi Barbara,

Glad to see that you got a similar results as mine. One thing: you don't need to change the fragment tolerance from 20 to 50 ppm because the MS2 mass seems OK. You just need to change the precursor and precursor true tolerance.

I got 4580 proteins. Here is my workflow in case you want to reproduce it: fragpipe.zip

Best,

Fengchao

babsillo commented 1 year ago

Thanks. I got 4565 Proteins with your workflow. I got it now. Thank you so much. Now I do not need to remeasure really a lot of samples. You have no idea how you made my day today. Thanks again.

I also want to transfer our phospho analysis to MSfragger - but I am still a bit lost with the result files. Just in case you offer trainings, schools, workshops whathever - I would love to be part of it. If there is a newletter - please sign me in. I like to know as much as possible - at least as much as a non IT person can understand.

Best Barbara

fcyu commented 1 year ago

Hi Barbara,

Glad to hear that FragPipe works well for you now.

We have a lot of tutorials here: https://fragpipe.nesvilab.org/ We also had some short course in the USHUPO and EMBO. We might have some webinar or online training in the future. Stay tuned!

Best,

Fengchao