Closed babsillo closed 1 year ago
I looked at the log and cannot tell why. I suspect something else is wrong with that file. You may need to share the two raw files with us, along with the output from "other software" that you are saying gives same results for both files
Hi,
I am sorry for my late reply. I now did zip compression on the bruker.d files (I am measuring on a bruker qtof) I provide the mzml which I used for MSfragger
And the mzID as a searchresult from the ProteinScape software – I use PS just for protein ID. I search with mascot – and I don’t know exactely how I should send the results. As the files are really big – I want to share them via our medunibox. Can you provide an email – which I can use to share the files?
Please tell me if it worked – as I am not sure about it. I know that my MS1 data is quite uncalibrated at the moment. Around 14-15 ppm. Does this affect the mass calibration of the MSfragger? In proteinscape and MSfragger I tried to use the same parameter – at least as similar as possible. In the meanwhile I try to work on my maschine – and probably find out why the mass is so off.
Thanks in advance Barbara
Von: Alexey Nesvizhskii @.> Gesendet: Montag, 23. Oktober 2023 20:24 An: Nesvilab/FragPipe @.> Cc: Darnhofer, Barbara @.>; Author @.> Betreff: [EXT] Re: [Nesvilab/FragPipe] data issue vs. msfragger issue (Issue #1306)
I looked at the log and cannot tell why. I suspect something else is wrong with that file. You may need to share the two raw files with us, along with the output from "other software" that you are saying gives same results for both files
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/1306#issuecomment-1775768091, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCKTSXZ2KZJUO4J6GW5JXNDYA2Y2FAVCNFSM6AAAAAA6MNBJVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZVG43DQMBZGE. You are receiving this because you authored the thread.Message ID: @.***>
Hi Barbara,
Could you upload the files, including the workflow file, .d folder, fasta file, result files, and the other software's results to https://www.dropbox.com/request/Xv3TrAYciOSpLLkhdw8R . We will investigate after receiving the files.
Thanks,
Fengchao
Thanks for your files. It seems that there was no decoy in your Mascot search. May I ask how did you filter the results to get 1% FDR?
Thanks,
Fengchao
I do decoy search, they are not directly in the database. I check them in a box, but mascot then sends the results to ProteinScape.
@.***
These are my searchsettings. Just the results from PS are not easy to provide. I gave you the mascot output directly. Or do you prefer other lists like excel?
Best barbara
Von: Fengchao @.> Gesendet: Montag, 30. Oktober 2023 19:12 An: Nesvilab/FragPipe @.> Cc: Darnhofer, Barbara @.>; Author @.> Betreff: [EXT] Re: [Nesvilab/FragPipe] data issue vs. msfragger issue (Issue #1306)
Thanks for your files. It seems that there was no decoy in your Mascot search. May I ask how did you filter the results to get 1% FDR?
Thanks,
Fengchao
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/1306#issuecomment-1785786773, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCKTSX4NRZTKCYG2QJJXP4LYB7UWVAVCNFSM6AAAAAA6MNBJVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBVG44DMNZXGM. You are receiving this because you authored the thread.Message ID: @.***>
Hi Barbara,
The mzID file (the mascot output) doesn't have decoys. I am not sure if the results have been filtered. Could you send the FDR filtered csv/tsv/Excel files?
Thanks,
Fengchao
HI,
Attached is an excel with the ProteinScape output from our mascot search. The data is for sure decoy searched – Mascot generates hisself what he needs. Final FDR in case of the 1382: 0.80% @.***
For the MSfragger, I loaded up the fasta. And let msfragger generate the database with contaminants and decoys. I am still thinking - due to my uncalibrated data (around 15ppm off) – Msfragger has problem recalibrating them? Do you think this can be a reason?
ProteinScape uses the internal recalibrated .mgf files generated from the bruker maXis. Do you also want the have them?
Best Barbara
Von: Fengchao @.> Gesendet: Dienstag, 31. Oktober 2023 14:38 An: Nesvilab/FragPipe @.> Cc: Darnhofer, Barbara @.>; Author @.> Betreff: [EXT] Re: [Nesvilab/FragPipe] data issue vs. msfragger issue (Issue #1306)
Hi Barbara,
The mzID file (the mascot output) doesn't have decoys. I am not sure if the results have been filtered. Could you send the FDR filtered csv/tsv/Excel files?
Thanks,
Fengchao
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/1306#issuecomment-1787237484, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCKTSX2QZNGAVUGSG7EYQO3YCD5LPAVCNFSM6AAAAAA6MNBJVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBXGIZTONBYGQ. You are receiving this because you authored the thread.Message ID: @.***>
Hi Barbara,
I didn't receive your files. Maybe GitHub truncated it. But after I enlarged the precursor tolerance and precursor true tolerance to 50 ppm, I got reasonable number of IDs:
INFO[12:27:49] Converged to 1.00 % FDR with 37722 PSMs decoy=377 threshold=0.656577 total=38099
INFO[12:27:49] Converged to 1.00 % FDR with 23700 Peptides decoy=237 threshold=0.786709 total=23937
INFO[12:27:49] Converged to 1.00 % FDR with 26960 Ions decoy=269 threshold=0.748066 total=27229
INFO[12:27:49] Protein inference results decoy=515 target=5428
INFO[12:27:49] Converged to 0.98 % FDR with 4580 Proteins decoy=45 threshold=0.9657 total=4625
INFO[12:27:50] Applying sequential FDR estimation ions=26818 peptides=23619 psms=37423
INFO[12:27:50] Converged to 0.15 % FDR with 37368 PSMs decoy=55 threshold=0.656577 total=37423
INFO[12:27:50] Converged to 0.20 % FDR with 23573 Peptides decoy=46 threshold=0.656577 total=23619
INFO[12:27:50] Converged to 0.17 % FDR with 26772 Ions decoy=46 threshold=0.656577 total=26818
And the mass calibration also showed that the error was ~25 ppm. I guess that was why 10 and 20 ppm didn't give you good IDs:
*********************MASS CALIBRATION AND PARAMETER OPTIMIZATION*******************
-----|---------------|---------------|---------------|---------------
| MS1 (Old) | MS1 (New) | MS2 (Old) | MS2 (New)
-----|---------------|---------------|---------------|---------------
Run | Median MAD | Median MAD | Median MAD | Median MAD
001 | 23.81 1.45 | -3.40 4.91 | 0.26 4.70 | 0.02 4.71
-----|---------------|---------------|---------------|---------------
Finding the optimal parameters:
-------|-------|-------|-------|-------|-------|-------
MS2 | 7 | 10 | 15 | 20 | 25 | 30
-------|-------|-------|-------|-------|-------|-------
Count | skip | 25254| 25068| skip rest
-------|-------|-------|-------|-------|-------|-------
-------|-------|-------|-------|-------
Peaks | 300_0 | 200_0 | 150_1 | 100_1
-------|-------|-------|-------|-------
Count | 25482| 25730| 25254| skip rest
-------|-------|-------|-------|-------
-------|-------
Int. | 1
-------|-------
Count | 24024
-------|-------
-------|-------
Rm P. | 0
-------|-------
Count | 25635
-------|-------
Best,
Fengchao
Hi,
Thanks a lot for your help.
I will try to reproduce this IDs.
But - hell no - my data is quite far away from beeing accurate.
It is nice that I can still work with MSfragger - but
It would be better to have better calibrated data, right?
And just open the ppm windows that large should be more an emergency rescue??
am I right?
In the meanwhile I did an MSfraggersearch with the .mgf recalibrated (from bruker).
And I got nice results with std. settings 10ppm.
so .mgf are recalibrated but I cannot do LFQ.
the .mzML are for LFQ but not recalibrated.
So, am I right. That:
Thanks again for your help - I really didnt know we are that off.
Best Barbara
Von: Fengchao @.***> Gesendet: Dienstag, 31. Oktober 2023 17:47:18 An: Nesvilab/FragPipe Cc: Darnhofer, Barbara; Author Betreff: [EXT] Re: [Nesvilab/FragPipe] data issue vs. msfragger issue (Issue #1306)
Hi Barbara,
I didn't receive your files. Maybe GitHub truncated it. But after I enlarged the precursor tolerance and precursor true tolerance to 50 ppm, I got reasonable number of IDs: [image]https://user-images.githubusercontent.com/6926299/279452164-69eb8d5b-44ca-40f4-8649-8953efa64a0f.png [image]https://user-images.githubusercontent.com/6926299/279452208-3f7063ea-061d-479a-b93c-30d3ea2bb0a4.png
INFO[12:27:49] Converged to 1.00 % FDR with 37722 PSMs decoy=377 threshold=0.656577 total=38099 INFO[12:27:49] Converged to 1.00 % FDR with 23700 Peptides decoy=237 threshold=0.786709 total=23937 INFO[12:27:49] Converged to 1.00 % FDR with 26960 Ions decoy=269 threshold=0.748066 total=27229 INFO[12:27:49] Protein inference results decoy=515 target=5428 INFO[12:27:49] Converged to 0.98 % FDR with 4580 Proteins decoy=45 threshold=0.9657 total=4625 INFO[12:27:50] Applying sequential FDR estimation ions=26818 peptides=23619 psms=37423 INFO[12:27:50] Converged to 0.15 % FDR with 37368 PSMs decoy=55 threshold=0.656577 total=37423 INFO[12:27:50] Converged to 0.20 % FDR with 23573 Peptides decoy=46 threshold=0.656577 total=23619 INFO[12:27:50] Converged to 0.17 % FDR with 26772 Ions decoy=46 threshold=0.656577 total=26818
And the mass calibration also showed that the error was ~25 ppm. I guess that was why 10 and 20 ppm didn't give you good IDs:
**MASS CALIBRATION AND PARAMETER OPTIMIZATION ----- | --------------- | --------------- | --------------- | --------------- | MS1 (Old) | MS1 (New) | MS2 (Old) | MS2 (New) |
---|---|---|---|---|---|---|---|---|
Run | Median MAD | Median MAD | Median MAD | Median MAD | ||||
001 | 23.81 1.45 | -3.40 4.91 | 0.26 4.70 | 0.02 4.71 | ||||
----- | --------------- | --------------- | --------------- | --------------- |
Finding the optimal parameters: ------- | ------- | ------- | ------- | ------- | ------- | ------- MS2 | 7 | 10 | 15 | 20 | 25 | 30 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Count | skip | 25254 | 25068 | skip rest | ||||||||
------- | ------- | ------- | ------- | ------- | ------- | ------- | ||||||
------- | ------- | ------- | ------- | ------- | ||||||||
Peaks | 300_0 | 200_0 | 150_1 | 100_1 | ||||||||
------- | ------- | ------- | ------- | ------- | ||||||||
Count | 25482 | 25730 | 25254 | skip rest | ||||||||
------- | ------- | ------- | ------- | ------- | ||||||||
------- | ------- | |||||||||||
Int. | 1 | |||||||||||
------- | ------- | |||||||||||
Count | 24024 | |||||||||||
------- | ------- | |||||||||||
------- | ------- | |||||||||||
Rm P. | 0 | |||||||||||
------- | ------- | |||||||||||
Count | 25635 | |||||||||||
------- | ------- |
Best,
Fengchao
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/1306#issuecomment-1787598718, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCKTSXYGHTFCVFDVDJG6WGLYCETRNAVCNFSM6AAAAAA6MNBJVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBXGU4TQNZRHA. You are receiving this because you authored the thread.Message ID: @.***>
Hi Barbara,
Yeah, you can set both the precursor mass tolerance and the precursor true mass tolerance to 50 ppm to resolve this issue. As far as I know, it won't affect the result too much because MSFragger is quite robust (we used 50 ppm as the default setting before). You can also use the calibrated MGF file from the vendor, but (as you also know) you can't perform label-free quantification.
Best,
Fengchao
Hi,
I tried to rerun the 1379.mzml hela with following conditions
[cid:3f9851b0-0ec8-4f06-8502-1e475ff5c254]
but it didnt work out, as in your case...
[cid:331cb251-3404-4de2-b21c-e0fb0966e832] What do I miss? Best Barbara
Von: Fengchao @.***> Gesendet: Dienstag, 31. Oktober 2023 18:39 An: Nesvilab/FragPipe Cc: Darnhofer, Barbara; Author Betreff: [EXT] Re: [Nesvilab/FragPipe] Large mass error resulted in low IDs (Issue #1306)
Hi Barbara,
Yeah, you can set both the precursor mass tolerance and the precursor true mass tolerance to 50 ppm to resolve this issue. As far as I know, it won't affect the result too much because MSFragger is quite robust (we used 50 ppm as the default setting before). You can also use the calibrated MGF file from the vendor, but (as you also know) you can't perform label-free quantification.
Best,
Fengchao
— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/1306#issuecomment-1787683383, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BCKTSX3SOYO2PWWXUIISQS3YCEZV7AVCNFSM6AAAAAA6MNBJVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBXGY4DGMZYGM. You are receiving this because you authored the thread.Message ID: @.***>
I didn't see anything in your previous message. GitHub often truncates the emails. You'd better go to https://github.com/Nesvilab/FragPipe/issues/1306 and reply the message there.
Also, could you share your log file?
Thanks,
Fengchao
Sorry, I directly replied to the mail - and didnt notice that the images got lost. I am sorry for that.
I tried to reproduce your results... did you worked with the mzml or directly with the .d folder? ...with the 1379 hela. .. I took the 1379.mzml 50ppm precursor and 50 ppm fragment mass tolerance.
but I got different calibration results...with low ids.
So, do I miss something else? I want to use msfragger for LFQ - so I really hope to get this right. Thanks Barbara
You should also change the precursor true tolerance at the bottom of the MSFragger tab:
Best,
Fengchao
It is clearly an advantage if somebody can properly read. (I really feel sorry) With 50 ppm precursor, 50ppm fragment, 20ppm true (because of non reading abilities) I still got 3990 proteins:
with your 50ppm true tolerance: I got 3936 proteins - nearly as much as in first search: my results are not the same as yours - but more similar.
Now the difference is quite low and I am really happy I can work with my data, lets say nearly properly. I attached the logfile from the 2nd search... 50 ppm all tolerances. Do you by any chance see the last (hopefully last) parameter not similar to your search? So I can work as nice as possible? Is one of these strategies better than the other? I mean 50/50/50 ppm versus 50/50/still20ppm for true fragment tolerance? Thanks again for your help and your patience. And I am really sorry for not understanding some things in the first place.
Best barbara
log file: 50/50/50 ppm log_2023-10-31_21-31-58.txt
logfile: 50/50/ still 20 true fragment tolerance log_2023-10-31_19-22-12.txt
Hi Barbara,
Glad to see that you got a similar results as mine. One thing: you don't need to change the fragment tolerance from 20 to 50 ppm because the MS2 mass seems OK. You just need to change the precursor and precursor true tolerance.
I got 4580 proteins. Here is my workflow in case you want to reproduce it: fragpipe.zip
Best,
Fengchao
Thanks. I got 4565 Proteins with your workflow. I got it now. Thank you so much. Now I do not need to remeasure really a lot of samples. You have no idea how you made my day today. Thanks again.
I also want to transfer our phospho analysis to MSfragger - but I am still a bit lost with the result files. Just in case you offer trainings, schools, workshops whathever - I would love to be part of it. If there is a newletter - please sign me in. I like to know as much as possible - at least as much as a non IT person can understand.
Best Barbara
Hi Barbara,
Glad to hear that FragPipe works well for you now.
We have a lot of tutorials here: https://fragpipe.nesvilab.org/ We also had some short course in the USHUPO and EMBO. We might have some webinar or online training in the future. Stay tuned!
Best,
Fengchao
These are 2 images of 2 different helaQC files...same maschine settings (qtof)...but just the calibration part. i am pretty sure - here somehow is my problem.
The upper one has issues. just about 1500 proteins Identified here. the lower one is fine with around 4000 proteins - what it should look like.
What exactly does this calibration table tell me?
with the vendor software both helas are fine with approx. the same protein/peptide IDs. I am just not getting the MSfragger difference here. or the dataproblem which I am having. 15ppm off in ms1 is a bit much I would say. (I am also writing back and forth with the vendor because of that issue)
I want to get a feeling and better understanding in outputfiles --> to be faster and better in troubleshooting. The data were aquired with all settings the same - just few days - in between. also msconvert on same pc. same version. true for msfragger versions.
both logfiles are attached. If you can give me any hits. or tutiorials describing the output in more detail. I would love that. Thanks in advance for helping Barbara
-