Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
208 stars 38 forks source link

Spectral library generation and/or quantification from MGF #157

Closed vdemichev closed 4 years ago

vdemichev commented 4 years ago

Is your feature request related to a problem? Please describe. I am generating an .mgf with pseudo spectra from DIA data (not with DIA-Umpire). I would like to (i) run MSFragger search on these and quantify the IDs - preferably directly, without converting to .mzML (ii) generate a spectral library in OpenSWATH (or similar) format (i.e. tab-separated table).

What works now: I convert MGF to mzML with MSConvert and analyse with quantification & spectral library generation disabled. Running from MGF directly produces "java.lang.RuntimeException: Cannot parse the MGF title", turning Freequant on - "No Spectra was found in data set", spectral library generation - "Process 'SpecLibGen' finished, exit code: 1" (without any apparent reason). Please find the logs attached. Spectral library generation fail.txt MGF fail.txt Quantification fail.txt

MGF format is like this: BEGIN IONS TITLE=MS/MS of 443.843 at 19.8791 PEPMASS=443.843 99.8591 RTINSECONDS=1192.75 350.194 77.0706 350.696 28.9084 457.253 31.3784 473.283 29.3342 561.77 27.528 602.325 40.552 617.284 25.3561 674.343 122.737 675.312 31.9284 699.381 80.7659 700.382 39.8032

Do you happen to have an example .mgf on which MSFragger is known to work?

Describe the solution you'd like Would be great if FragPipe were able to quantify from this kind of .mgf. As this is basically the spectrum at the DIA peak apex, it can indeed be used for quantification (both MS1 and MS2). In fact, it is possible to preprocess these spectra so that the MS1 intensity is actually the peak area, not peak height.

Also, would be very convenient if conversion of SpectraST output .splib format into a tab separated table were built in FragPipe.

anesvi commented 4 years ago

Hi Vadim,

We have limited support for MGF because there are so many formats. However, Fengchao or someone can send you some versions of MGF that MSFragger can work with.

I am not sure if SpectraST can deal with MGF.I forgot. Guo Ci?

Label-free will not work with MGF (or mzML from those MGF) because the files are missing MS1 data.

Spectral library building in FragPipe works with mzXML generated from MGF (from DIA-Umpire). Not sure why you got an error. Ned to take a look.

The final output from FragPipe spec lib module that we use is con_lib.tsv file. It is compatible with both Spectronaut and DIA-NN. Perhaps this is what you need?

Best, Alexey

From: Vadim Demichev notifications@github.com Sent: Sunday, December 8, 2019 2:58 PM To: Nesvilab/FragPipe FragPipe@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [Nesvilab/FragPipe] Spectral library generation and/or quantification from MGF (#157)

External Email - Use Caution

Is your feature request related to a problem? Please describe. I am generating an .mgf with pseudo spectra from DIA data (not with DIA-Umpire). I would like to (i) run MSFragger search on these and quantify the IDs - preferably directly, without converting to .mzML (ii) generate a spectral library in OpenSWATH (or similar) format (i.e. tab-separated table).

What works now: I convert MGF to mzML with MSConvert and analyse with quantification & spectral library generation disabled. Running from MGF directly produces "java.lang.RuntimeException: Cannot parse the MGF title", turning Freequant on - "No Spectra was found in data set", spectral library generation - "Process 'SpecLibGen' finished, exit code: 1" (without any apparent reason). Please find the logs attached. Spectral library generation fail.txthttps://github.com/Nesvilab/FragPipe/files/3937073/Spectral.library.generation.fail.txt MGF fail.txthttps://github.com/Nesvilab/FragPipe/files/3937074/MGF.fail.txt Quantification fail.txthttps://github.com/Nesvilab/FragPipe/files/3937075/Quantification.fail.txt

MGF format is like this: BEGIN IONS TITLE=MS/MS of 443.843 at 19.8791 PEPMASS=443.843 99.8591 RTINSECONDS=1192.75 350.194 77.0706 350.696 28.9084 457.253 31.3784 473.283 29.3342 561.77 27.528 602.325 40.552 617.284 25.3561 674.343 122.737 675.312 31.9284 699.381 80.7659 700.382 39.8032

Do you happen to have an example .mgf on which MSFragger is known to work?

Describe the solution you'd like Would be great if FragPipe were able to quantify from this kind of .mgf. As this is basically the spectrum at the DIA peak apex, it can indeed be used for quantification (both MS1 and MS2). In fact, it is possible to preprocess these spectra so that the MS1 intensity is actually the peak area, not peak height.

Also, would be very convenient if conversion of SpectraST output .splib format into a tab separated table were built in FragPipe.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/157?email_source=notifications&email_token=AIIMM65LFVFVWVZCQIQ6NV3QXVGTNA5CNFSM4JX76MHKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H65HGKA, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6YMHE7XSMODBHC2CQDQXVGTNANCNFSM4JX76MHA.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

fcyu commented 4 years ago

Hi Vadim,

MSFragger currently supports the MGF title ending with scan_number.scan_number.charge. We need to use the scan number to locate each scan, so your format (TITLE=MS/MS of 443.843 at 19.8791) is quite difficult to support. Could you please assign a unique scan number to each scan?

Best,

Fengchao

vdemichev commented 4 years ago

Hi Alexey,

Many thanks for the swift reply and the explanations!

Label-free will not work with MGF (or mzML from those MGF) because the files are missing MS1 data.

I was thinking about pre-calculating the quantity (MS1 peak integration) and including it in .mgf as PEPMASS=[m/z] [quantity]. But I guess rerunning with a spectral library would be better anyway in most cases.

The final output from FragPipe spec lib module that we use is con_lib.tsv file. It is compatible with both Spectronaut and DIA-NN. Perhaps this is what you need?

Yes, exactly. I guess I just did not get the .tsv file (only .splib) because of that error.

vdemichev commented 4 years ago

Hi Fengchao,

Thank you! Could you please give an example of such title? Is it OK to omit the charge (MSFragger seems to do deisotoping quite successfully by itself)?

fcyu commented 4 years ago

Hi Vadim,

A title like b1906_293T_proteinID_01A_QE3_122212.1882.1882.3 should be OK. If the precursor charge is unknown, you may use 0. Then, MSFragger would try all charge stats listed in the parameter precursor_charge.

Best,

Fengchao

vdemichev commented 4 years ago

It works :) Thank you so much!

vdemichev commented 4 years ago

I think I've figured out what's the problem with the spectral library generation: it cannot launch python because the path contains spaces. I guess the solution would be to put the path in quotes, i.e. ""C:/Program Files (x86)/.../python.exe" [args]" instead of just "C:/Program Files (x86)/.../python.exe [args]". But manual conversion to .mrm with SpectraST works.