Nesvilab / FragPipe

A cross-platform Graphical User Interface (GUI) for running MSFragger and Philosopher - powered pipeline for comprehensive analysis of shotgun proteomics data
http://fragpipe.nesvilab.org
Other
184 stars 37 forks source link

Run error #4

Closed azhang8 closed 6 years ago

azhang8 commented 7 years ago

Hi,

We are currently trying to run a file and it is failing and displaying the following errors:

Executing command: $> C:\Users\DPCF_SuperComp_v2\Desktop\MSFragger\philosopher-source_windows_amd64.exe peptideprophet --nonparam --expectscore --decoy rev --decoyprobs --masswidth 1000.0 --clevel -2 --database C:\Users\DPCF_SuperComp_v2\Desktop\MSFragger\rSP_Hu_Mix1_090716.fasta C:\Users\DPCF_SuperComp_v2\Desktop\MSFragger\output\ID22689_04_E749_4234_031116.pepXML Process started Failed to open input file 'C:\Users\DPCF_SuperComp_v2\Desktop\MSFragger\output/ID22689_04_E749_4234_031116.mzXML'. WARNING: cannot open data file C:\Users\DPCF_SuperComp_v2\Desktop\MSFragger\output/ID22689_04_E749_4234_031116.mzXML in msms_run_summary tag... trying .mzML ... Failed to open input file 'C:\Users\DPCF_SuperComp_v2\Desktop\MSFragger\output/ID22689_04_E749_4234_031116.mzML'. WARNING: CANNOT correct data file C:\Users\DPCF_SuperComp_v2\Desktop\MSFragger\output/ID22689_04_E749_4234_031116.mzML in msms_run_summary tag... Failed to open input file 'C:\Users\DPCF_SuperComp_v2\Desktop\MSFragger\output/ID22689_04_E749_4234_031116.mzXML'. WARNING: cannot open data file C:\Users\DPCF_SuperComp_v2\Desktop\MSFragger\output/ID22689_04_E749_4234_031116.mzXML in msms_run_summary tag... trying .mzML ... Failed to open input file 'C:\Users\DPCF_SuperComp_v2\Desktop\MSFragger\output/ID22689_04_E749_4234_031116.mzML'. WARNING: CANNOT correct data file C:\Users\DPCF_SuperComp_v2\Desktop\MSFragger\output/ID22689_04_E749_4234_031116.mzML in msms_run_summary tag...

Is there something wrong with the input we are giving MSFragger?

Thanks,

Austin

chhh commented 7 years ago

@azhang8 could you please post the whole output of the run?

Is it just looping with the same messages? Unless you unchecked the checkbox on the MSFragger tab, but that point in the execution (the one you've posted) MSFragger has already finished searching. It's peptide prophet that gets stuck for some reason.

azhang8 commented 7 years ago

Hi,

Here is the complete output of the run. Is there anything we can do to get peptide prophet to work?

Thanks,

Austin

Will execute 13 commands: java -jar C:\Program Files\MSFragger_20170103_v2\MSFragger_20170103\MSFragger.jar C:\Program Files\MSFragger-GUI_v2.6\4234_output\fragger.params C:\Program Files\MSFragger-GUI_v2.6\ID22689_04_E749_4234_031116.mzML

java -cp C:\Program Files\MSFragger-GUI_v2.6\MSFragger-GUI.jar umich.msfragger.util.FileMove C:\Program Files\MSFragger-GUI_v2.6\ID22689_04_E749_4234_031116.pepXML C:\Program Files\MSFragger-GUI_v2.6\4234_output\ID22689_04_E749_4234_031116.pepXML

C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe workspace --init

C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe peptideprophet --nonparam --expectscore --decoy rev --decoyprobs --masswidth 1000.0 --clevel -2 --database C:\Program Files\MSFragger-GUI_v2.6\rSP_Hu_Mix1_090716.fasta C:\Program Files\MSFragger-GUI_v2.6\4234_output\ID22689_04_E749_4234_031116.pepXML

C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe workspace --clean

C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe workspace --init

C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe proteinprophet --output interact --maxppmdiff 20.0 interact-ID22689_04_E749_4234_031116.pep.xml

C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe workspace --clean

C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe workspace --init

C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe database --annotate C:\Program Files\MSFragger-GUI_v2.6\rSP_Hu_Mix1_090716.fasta

C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe filter --mapmods --sequential --pepxml C:\Program Files\MSFragger-GUI_v2.6\4234_output --protxml C:\Program Files\MSFragger-GUI_v2.6\4234_output\interact.prot.xml

C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe report

C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe workspace --clean



Executing command:
$> java -jar C:\Program Files\MSFragger_20170103_v2\MSFragger_20170103\MSFragger.jar C:\Program Files\MSFragger-GUI_v2.6\4234_output\fragger.params C:\Program Files\MSFragger-GUI_v2.6\ID22689_04_E749_4234_031116.mzML 
Process started
Peptide index read in 640ms
Selected fragment tolerance 0.02 Da and maximum fragment slice size of 4966.13MB

327469124 fragments to be searched in 1 slices (2.44GB total)
Operating on slice 1 of 1: 
10796ms
    ID22689_04_E749_4234_031116.mzML 
9047ms

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 1453/45449 (3.20%) - 284.40 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 2248/45449 (4.95%) - 156.56 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 3009/45449 (6.62%) - 151.71 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 3776/45449 (8.31%) - 149.66 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 4562/45449 (10.04%) - 153.37 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 5304/45449 (11.67%) - 146.12 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 6045/45449 (13.30%) - 145.01 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 6792/45449 (14.94%) - 148.48 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 7520/45449 (16.55%) - 142.91 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 8282/45449 (18.22%) - 149.15 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 9031/45449 (19.87%) - 146.60 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 9784/45449 (21.53%) - 149.20 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 10539/45449 (23.19%) - 148.68 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 11281/45449 (24.82%) - 146.09 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 12008/45449 (26.42%) - 144.50 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 12716/45449 (27.98%) - 140.73 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 13403/45449 (29.49%) - 136.55 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 14088/45449 (31.00%) - 136.56 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 14762/45449 (32.48%) - 134.37 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 15408/45449 (33.90%) - 128.40 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 16058/45449 (35.33%) - 128.79 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 16735/45449 (36.82%) - 132.51 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 17432/45449 (38.36%) - 136.83 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 18125/45449 (39.88%) - 138.16 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 18839/45449 (41.45%) - 142.37 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 19581/45449 (43.08%) - 146.12 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 20333/45449 (44.74%) - 147.62 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 21104/45449 (46.43%) - 151.35 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 21871/45449 (48.12%) - 150.13 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 22639/45449 (49.81%) - 151.69 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 23413/45449 (51.51%) - 154.34 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 24359/45449 (53.60%) - 186.26 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 25334/45449 (55.74%) - 192.61 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 26317/45449 (57.90%) - 192.97 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 27268/45449 (60.00%) - 189.63 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 28220/45449 (62.09%) - 187.44 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 29224/45449 (64.30%) - 198.97 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 30385/45449 (66.86%) - 226.54 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 31560/45449 (69.44%) - 229.94 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 32742/45449 (72.04%) - 235.65 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 33902/45449 (74.59%) - 229.89 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 35185/45449 (77.42%) - 250.34 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 36839/45449 (81.06%) - 325.65 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 38503/45449 (84.72%) - 330.75 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 40197/45449 (88.44%) - 330.54 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 42563/45449 (93.65%) - 467.40 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 43956/45449 (96.71%) - 275.13 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 45438/45449 (99.98%) - 294.57 spectra/s]

    ID22689_04_E749_4234_031116.mzML 9047ms [progress: 45449/45449 (100.00%) - 50.23 spectra/s]
 - completed 243766ms

Process finished, exit value: 0

Executing command:
$> java -cp C:\Program Files\MSFragger-GUI_v2.6\MSFragger-GUI.jar umich.msfragger.util.FileMove C:\Program Files\MSFragger-GUI_v2.6\ID22689_04_E749_4234_031116.pepXML C:\Program Files\MSFragger-GUI_v2.6\4234_output\ID22689_04_E749_4234_031116.pepXML 
Process started
Process finished, exit value: 0

Executing command:
$> C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe workspace --init 
Process started
INFO[15:21:09] Creating workspace                           
WARN[15:21:09] existing workspace detected, will not overwrite 
INFO[15:21:09] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe peptideprophet --nonparam --expectscore --decoy rev --decoyprobs --masswidth 1000.0 --clevel -2 --database C:\Program Files\MSFragger-GUI_v2.6\rSP_Hu_Mix1_090716.fasta C:\Program Files\MSFragger-GUI_v2.6\4234_output\ID22689_04_E749_4234_031116.pepXML 
Process started
Failed to open input file 'C:\Program Files\MSFragger-GUI_v2.6\4234_output/ID22689_04_E749_4234_031116.mzXML'.
WARNING: cannot open data file C:\Program Files\MSFragger-GUI_v2.6\4234_output/ID22689_04_E749_4234_031116.mzXML in msms_run_summary tag... trying .mzML ...
Failed to open input file 'C:\Program Files\MSFragger-GUI_v2.6\4234_output/ID22689_04_E749_4234_031116.mzML'.
WARNING: CANNOT correct data file C:\Program Files\MSFragger-GUI_v2.6\4234_output/ID22689_04_E749_4234_031116.mzML in msms_run_summary tag...
Failed to open input file 'C:\Program Files\MSFragger-GUI_v2.6\4234_output/ID22689_04_E749_4234_031116.mzXML'.
WARNING: cannot open data file C:\Program Files\MSFragger-GUI_v2.6\4234_output/ID22689_04_E749_4234_031116.mzXML in msms_run_summary tag... trying .mzML ...
Failed to open input file 'C:\Program Files\MSFragger-GUI_v2.6\4234_output/ID22689_04_E749_4234_031116.mzML'.
WARNING: CANNOT correct data file C:\Program Files\MSFragger-GUI_v2.6\4234_output/ID22689_04_E749_4234_031116.mzML in msms_run_summary tag...

 file 1: C:\Program Files\MSFragger-GUI_v2.6\4234_output\ID22689_04_E749_4234_031116.pepXML

 processed altogether 39233 results

INFO: Results written to file: C:\Program Files\MSFragger-GUI_v2.6\4234_output\interact-ID22689_04_E749_4234_031116.pep.xml

  - C:\Program Files\MSFragger-GUI_v2.6\4234_output\interact-ID22689_04_E749_4234_031116.pep.xml

  - Building Commentz-Walter keyword tree...

  - Searching the tree...
  - Linking duplicate entries...
  - Printing results...

Using Decoy Label "rev".
Decoy Probabilities will be reported.
Using non-parametric distributions
 (X! Tandem) (using Tandem's expectation score for modeling)

init with X! Tandem trypsin 

 PeptideProphet  (TPP v5.0.1 Post-Typhoon dev, Build 201705191533-7588 (Windows_NT-x86_64)) AKeller@ISB
 read in 0 1+, 15898 2+, 13530 3+, 6051 4+, 2937 5+, 785 6+, and 24 7+ spectra.
Found 0 Decoys, and 39225 Non-Decoys
WARNING: No decoys with label rev were found in this dataset. reverting to fully unsupervised method.
negmean = 0.0533258

MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN

INFO: Processing standard MixtureModel ... 
Initialising statistical models ...
INFO[15:21:23] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe workspace --clean 
Process started
INFO[15:21:24] Removing workspace                           
WARN[15:21:24] cannot remove the meta data: remove .meta\meta.bin: The process cannot access the file because it is being used by another process. 
INFO[15:21:24] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe workspace --init 
Process started
INFO[15:21:24] Creating workspace                           
WARN[15:21:24] existing workspace detected, will not overwrite 
INFO[15:21:24] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe proteinprophet --output interact --maxppmdiff 20.0 interact-ID22689_04_E749_4234_031116.pep.xml 
Process started
ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v5.0.1 Post-Typhoon dev, Build 201705191533-7588 (Windows_NT-x86_64))
 (no FPKM) (using degen pep info)
Reading in C:/Program Files/MSFragger-GUI_v2.6/4234_output/interact-ID22689_04_E749_4234_031116.pep.xml...

did not find any PeptideProphet results in input data!  Did you forget to run PeptideProphet?
...read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, 0 7+ spectra with min prob 0

WARNING: no data - output file will be empty
INFO[15:21:27] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe workspace --clean 
Process started
INFO[15:21:27] Removing workspace                           
WARN[15:21:27] cannot remove the meta data: remove .meta\meta.bin: The process cannot access the file because it is being used by another process. 
INFO[15:21:27] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe workspace --init 
Process started
INFO[15:21:27] Creating workspace                           
WARN[15:21:27] existing workspace detected, will not overwrite 
INFO[15:21:27] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe database --annotate C:\Program Files\MSFragger-GUI_v2.6\rSP_Hu_Mix1_090716.fasta 
Process started
INFO[15:21:28] Processing database                          

FATA[15:21:28] Cannot identify the database type            

Process finished, exit value: 1

Executing command:
$> C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe filter --mapmods --sequential --pepxml C:\Program Files\MSFragger-GUI_v2.6\4234_output --protxml C:\Program Files\MSFragger-GUI_v2.6\4234_output\interact.prot.xml 
Process started
INFO[15:21:29] Processing peptide identification files      

INFO[15:21:32] 1+ Charge profile                             decoy=0 target=0
INFO[15:21:32] 2+ Charge profile                             decoy=0 target=15904
INFO[15:21:32] 3+ Charge profile                             decoy=0 target=13530
INFO[15:21:32] 4+ Charge profile                             decoy=0 target=6051
INFO[15:21:32] 5+ Charge profile                             decoy=0 target=2937
INFO[15:21:32] 6+ Charge profile                             decoy=0 target=785

INFO[15:21:32] Database search results                       ions=23212 peptides=21015 psms=39233

INFO[15:21:32] Converged to 0.00 % FDR with 39233 PSMs       decoy=0 threshold=0 total=39233

INFO[15:21:32] Converged to 0.00 % FDR with 21015 Peptides   decoy=0 threshold=0 total=21015

INFO[15:21:33] Converged to 0.00 % FDR with 23212 Ions       decoy=0 threshold=0 total=23212

FATA[15:21:33] No Protein groups detected, check your file and try again 

Process finished, exit value: 1

Executing command:
$> C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe report 
Process started
INFO[15:21:33] Creating PSM report                          
INFO[15:21:33] Creating peptide Ion report                  
INFO[15:21:33] Creating peptide report                      
INFO[15:21:33] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe workspace --clean 
Process started
INFO[15:21:34] Removing workspace                           
WARN[15:21:34] cannot remove the meta data: remove .meta\meta.bin: The process cannot access the file because it is being used by another process. 
INFO[15:21:34] Done                                         

Process finished, exit value: 0

=========================
===
===        Done
===
=========================
jspaezp commented 7 years ago

As far as i can tell the problem lies in the fact that the input file: C:\Program Files\MSFragger-GUI_v2.6\ID22689_04_E749_4234_031116.mzML ; Is located in a directory other than the output directory, therefore when philosopher runs on the output dir: C:\Program Files\MSFragger-GUI_v2.6\4234_output/ , it is not able to find the corresponding mzML file (because it is not looking in the right place). This could be solved manually by setting the input and output directories as the same folder (while the developers implement searching for the files in the formerly specified input).

In addition I can note that when it runs

Executing command:
$> C:\Program Files\MSFragger-GUI_v2.6\philosopher-source_windows_amd64.exe database --annotate C:\Program Files\MSFragger-GUI_v2.6\rSP_Hu_Mix1_090716.fasta 
Process started
INFO[15:21:28] Processing database                          

FATA[15:21:28] Cannot identify the database type            

Process finished, exit value: 1

it does not process the database to generate the decoys, which would explain the later lack of decoy matches:

INFO[15:21:32] 1+ Charge profile                             decoy=0 target=0
INFO[15:21:32] 2+ Charge profile                             decoy=0 target=15904
INFO[15:21:32] 3+ Charge profile                             decoy=0 target=13530
INFO[15:21:32] 4+ Charge profile                             decoy=0 target=6051
INFO[15:21:32] 5+ Charge profile                             decoy=0 target=2937
INFO[15:21:32] 6+ Charge profile                             decoy=0 target=785

INFO[15:21:32] Database search results                       ions=23212 peptides=21015 psms=39233

INFO[15:21:32] Converged to 0.00 % FDR with 39233 PSMs       decoy=0 threshold=0 total=39233

INFO[15:21:32] Converged to 0.00 % FDR with 21015 Peptides   decoy=0 threshold=0 total=21015

INFO[15:21:33] Converged to 0.00 % FDR with 23212 Ions       decoy=0 threshold=0 total=23212

FATA[15:21:33] No Protein groups detected, check your file and try again 

Process finished, exit value: 1

I hope this can be easily fixed.

chhh commented 7 years ago

Those warnings about mzm/mzxml are not critical. The raw files are not really needed, it's ok. On Linux it actually should create symlink to raw files in the output directory, on Windows creating symlinks requires special permissions and we decided against copying whole files.

The program won't create a database with decoys for you (philosopher database --annotate serves a different purpose), you're expected to provide a database yourself. Philosopher does have the tools bundled that can download a db and append decoys for you.

The problem is that "philosopher database --annotate" could not parse the DB. It most likely didn't like something about protein description strings. We're working on a fix.

azhang8 commented 7 years ago

Hi,

Thanks for the help. We also were wondering whether or not setting the additional modifications to 0, as done in the screenshot below, means that any modification is open to be searched for. Or should they be unchecked?

gitq

chhh commented 7 years ago

Additional Modifications are essentially fixed modifications, it's like modifying the mass of an amino acid (or terminus) by a fixed value.

The checkboxes are there for convenience - if you uncheck a box, it won't be included in the parameter file that is written for MSFragger, but the delta mass value won't be forgotten, so you can reactivate it later. Having a value of zero is the same as unchecking the box.

The same goes for checkboxes of Variable Mods. They're there for convenience, so that you didn't have to retype specificity and exact mass values, in case you want to just try turn them off temporarily.

If you fixed Cysteines chemically, I'd recommend setting the additional mod for C to 57.021464, otherwise all the mass-shifts for Cysteine-containing peptides will be off by 57, but it's up to you.

For each run, if MSFragger was set to run, you will find the .param file that was used for the search in the output directory. You can also click Save button at the top of this page to just save the parameter file separately - you can check the effects of various options on this configuration page that way. You can also load previously saved param files (or param files from previously used output directories).

prvst commented 7 years ago

If you inspect the log carefully you will see that not only Philosopher failed but also PeptideProphet and ProteinProphet. It seems that the problem stems from a badly formatted database file.

@azhang8; if you share your database file I may be able to guide you on how to fix it.

chhh commented 7 years ago

@azhang8 Has the issue been resolved?

prvst commented 7 years ago

@azhang8 I replied to your e-mail on Jun 22 with the instructions you have to follow, please check your e-mail.

remigs commented 7 years ago

Hi, I have similar issues with a fasta file. Please see below. Tried to generate a combined forward and reverse, did it in this format:

>sp|Q13542|4EBP2_HUMAN Eukaryotic translation initiation factor 4E-binding protein 2 OS=Homo sapiens GN=EIF4EBP2 PE=1 SV=1
MSSSAGSGHQPSQSRAIPTRTVAISDAAQLPHDYCTTPGGTLFSTTPGGTRIIYDRKFLLDRRNSPMAQTPPCHLPNIPGVTSPGTLIEDSKVEVNNLNNLNNHDRKHAVGDDAQFEMDI
>sp|REV_Q13542|4EBP2_HUMAN Eukaryotic translation initiation factor 4E-binding protein 2 OS=Homo sapiens GN=EIF4EBP2 PE=1 SV=1
PRPLPNRCFPGDNSEHIQDAIDAPQFLYGDHDYARPMKDRGISSLSVRLLTPTSPGTRGPSRTAMTPHNLSADTSTHNDECTTAVIVKLNIKNIVEQLVSNAGDNFHAPTGMGSSTQGIQ

It does not recognize the reverse, can anyone help?

Also got a couple of warnings re peptide/protein phophets. Is the issue with reading rev hits causing it? Please see below:

Will execute 12 commands:
java -jar -Xmx8G C:\Users\rserwa\Desktop\working directory fragger\MSFragger.jar C:\Users\rserwa\Desktop\working directory fragger\outcome\fragger.params C:\Users\rserwa\Desktop\working directory fragger\outcome\RS_MCF-1.mzML 
philosopher_windows_amd64.exe workspace --init 
philosopher_windows_amd64.exe peptideprophet --nonparam --expectscore --decoy rev --decoyprobs --masswidth 1000.0 --clevel 2 --database C:\Users\rserwa\Desktop\working directory fragger\human_fragger.fasta C:\Users\rserwa\Desktop\working directory fragger\outcome\RS_MCF-1.pepXML 
philosopher_windows_amd64.exe workspace --clean 
philosopher_windows_amd64.exe workspace --init 
philosopher_windows_amd64.exe proteinprophet --output interact --maxppmdiff 20 interact-RS_MCF-1.pep.xml 
philosopher_windows_amd64.exe workspace --clean 
philosopher_windows_amd64.exe workspace --init 
philosopher_windows_amd64.exe database --annotate C:\Users\rserwa\Desktop\working directory fragger\human_fragger.fasta 
philosopher_windows_amd64.exe filter --sequential --mapmods --pepxml C:\Users\rserwa\Desktop\working directory fragger\outcome --protxml C:\Users\rserwa\Desktop\working directory fragger\outcome\interact.prot.xml 
philosopher_windows_amd64.exe report 
philosopher_windows_amd64.exe workspace --clean 
~~~~~~~~~~~~~~~~~~~~~~
Executing command:
$> java -jar -Xmx8G C:\Users\rserwa\Desktop\working directory fragger\MSFragger.jar C:\Users\rserwa\Desktop\working directory fragger\outcome\fragger.params C:\Users\rserwa\Desktop\working directory fragger\outcome\RS_MCF-1.mzML 
Process started
Peptide index read in 107ms
Selected fragment tolerance 0.02 Da and maximum fragment slice size of 4985.45MB
171161740 fragments to be searched in 1 slices (1.27GB total)
Operating on slice 1 of 1: 
4167ms
    RS_MCF-1.mzML 
8245ms
    RS_MCF-1.mzML 8245ms [progress: 2152/29397 (7.32%) - 426.14 spectra/s]
    RS_MCF-1.mzML 8245ms [progress: 4216/29397 (14.34%) - 409.44 spectra/s]
    RS_MCF-1.mzML 8245ms [progress: 6293/29397 (21.41%) - 406.86 spectra/s]
    RS_MCF-1.mzML 8245ms [progress: 8393/29397 (28.55%) - 409.92 spectra/s]
    RS_MCF-1.mzML 8245ms [progress: 10310/29397 (35.07%) - 380.06 spectra/s]
    RS_MCF-1.mzML 8245ms [progress: 12389/29397 (42.14%) - 410.63 spectra/s]
    RS_MCF-1.mzML 8245ms [progress: 14484/29397 (49.27%) - 417.25 spectra/s]
    RS_MCF-1.mzML 8245ms [progress: 16735/29397 (56.93%) - 444.33 spectra/s]
    RS_MCF-1.mzML 8245ms [progress: 19094/29397 (64.95%) - 467.50 spectra/s]
    RS_MCF-1.mzML 8245ms [progress: 21529/29397 (73.24%) - 478.77 spectra/s]
    RS_MCF-1.mzML 8245ms [progress: 24137/29397 (82.11%) - 520.35 spectra/s]
    RS_MCF-1.mzML 8245ms [progress: 27113/29397 (92.23%) - 585.02 spectra/s]
    RS_MCF-1.mzML 8245ms [progress: 29397/29397 (100.00%) - 553.16 spectra/s]
 - completed 64933ms
Process finished, exit value: 0
Executing command:
$> philosopher_windows_amd64.exe workspace --init 
Process started
INFO[16:10:13] Creating workspace                           
INFO[16:10:13] Done                                         
Process finished, exit value: 0
Executing command:
$> philosopher_windows_amd64.exe peptideprophet --nonparam --expectscore --decoy rev --decoyprobs --masswidth 1000.0 --clevel 2 --database C:\Users\rserwa\Desktop\working directory fragger\human_fragger.fasta C:\Users\rserwa\Desktop\working directory fragger\outcome\RS_MCF-1.pepXML 
Process started
Failed to open input file 'C:\Users\rserwa\Desktop\working directory fragger\outcome/RS_MCF-1.mzXML'.
WARNING: cannot open data file C:\Users\rserwa\Desktop\working directory fragger\outcome/RS_MCF-1.mzXML in msms_run_summary tag... trying .mzML ...
SUCCESS: CORRECTED data file C:\Users\rserwa\Desktop\working directory fragger\outcome/RS_MCF-1.mzML in msms_run_summary tag...
 file 1: C:\Users\rserwa\Desktop\working directory fragger\outcome\RS_MCF-1.pepXML
 processed altogether 23312 results
INFO: Results written to file: C:\Users\rserwa\Desktop\working directory fragger\outcome\interact-RS_MCF-1.pep.xml
  - C:\Users\rserwa\Desktop\working directory fragger\outcome\interact-RS_MCF-1.pep.xml
  - Building Commentz-Walter keyword tree...
  - Searching the tree...
  - Linking duplicate entries...
  - Printing results...
Using Decoy Label "rev".
Decoy Probabilities will be reported.
Using non-parametric distributions
 (X! Tandem) (using Tandem's expectation score for modeling)
init with X! Tandem trypsin 
 PeptideProphet  (TPP v5.0.1 Post-Typhoon dev, Build 201705191533-7588 (Windows_NT-x86_64)) AKeller@ISB
 read in 0 1+, 11799 2+, 9109 3+, 2010 4+, 324 5+, 48 6+, and 17 7+ spectra.
Found 0 Decoys, and 23307 Non-Decoys
WARNING: No decoys with label rev were found in this dataset. reverting to fully unsupervised method.
negmean = 0.0533258

MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN
INFO: Processing standard MixtureModel ... 
Initialising statistical models ...
INFO[16:10:19] Done                                         
Process finished, exit value: 0
Executing command:
$> philosopher_windows_amd64.exe workspace --clean 
Process started
INFO[16:10:20] Removing workspace                           
WARN[16:10:20] cannot remove the meta data: remove .meta\meta.bin: The process cannot access the file because it is being used by another process. 
INFO[16:10:20] Done                                         
Process finished, exit value: 0
Executing command:
$> philosopher_windows_amd64.exe workspace --init 
Process started
INFO[16:10:20] Creating workspace                           
WARN[16:10:20] existing workspace detected, will not overwrite 
INFO[16:10:20] Done                                         
Process finished, exit value: 0
Executing command:
$> philosopher_windows_amd64.exe proteinprophet --output interact --maxppmdiff 20 interact-RS_MCF-1.pep.xml 
Process started
ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v5.0.1 Post-Typhoon dev, Build 201705191533-7588 (Windows_NT-x86_64))
 (no FPKM) (using degen pep info)
Reading in C:/Users/rserwa/Desktop/working directory fragger/outcome/interact-RS_MCF-1.pep.xml...
did not find any PeptideProphet results in input data!  Did you forget to run PeptideProphet?
...read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, 0 7+ spectra with min prob 0.05

WARNING: no data - output file will be empty
INFO[16:10:22] Done                                         
Process finished, exit value: 0
Executing command:
$> philosopher_windows_amd64.exe workspace --clean 
Process started
INFO[16:10:22] Removing workspace                           
WARN[16:10:22] cannot remove the meta data: remove .meta\meta.bin: The process cannot access the file because it is being used by another process. 
INFO[16:10:22] Done                                         
Process finished, exit value: 0
Executing command:
$> philosopher_windows_amd64.exe workspace --init 
Process started
INFO[16:10:22] Creating workspace                           
WARN[16:10:22] existing workspace detected, will not overwrite 
INFO[16:10:22] Done                                         
Process finished, exit value: 0
Executing command:
$> philosopher_windows_amd64.exe database --annotate C:\Users\rserwa\Desktop\working directory fragger\human_fragger.fasta 
Process started
INFO[16:10:23] Processing database                          

INFO[16:10:26] Done                                         
Process finished, exit value: 0
Executing command:
$> philosopher_windows_amd64.exe filter --sequential --mapmods --pepxml C:\Users\rserwa\Desktop\working directory fragger\outcome --protxml C:\Users\rserwa\Desktop\working directory fragger\outcome\interact.prot.xml 
Process started
INFO[16:10:26] Processing peptide identification files      
INFO[16:10:28] 1+ Charge profile                             decoy=0 target=0
INFO[16:10:28] 2+ Charge profile                             decoy=0 target=11804
INFO[16:10:28] 3+ Charge profile                             decoy=0 target=9109
INFO[16:10:28] 4+ Charge profile                             decoy=0 target=2010
INFO[16:10:28] 5+ Charge profile                             decoy=0 target=324
INFO[16:10:28] 6+ Charge profile                             decoy=0 target=48
INFO[16:10:28] Database search results                       ions=18462 peptides=16274 psms=23312
INFO[16:10:28] Converged to 0.00 % FDR with 23312 PSMs       decoy=0 threshold=0 total=23312
INFO[16:10:28] Converged to 0.00 % FDR with 16274 Peptides   decoy=0 threshold=0 total=16274
INFO[16:10:28] Converged to 0.00 % FDR with 18462 Ions       decoy=0 threshold=0 total=18462
FATA[16:10:28] No Protein groups detected, check your file and try again 
Process finished, exit value: 1
Executing command:
$> philosopher_windows_amd64.exe report 
Process started
INFO[16:10:28] Creating PSM report                          
INFO[16:10:28] Creating peptide Ion report                  
INFO[16:10:29] Creating peptide report                      
INFO[16:10:29] Done                                         

Process finished, exit value: 0
Executing command:
$> philosopher_windows_amd64.exe workspace --clean 
Process started
INFO[16:10:29] Removing workspace                           
WARN[16:10:29] cannot remove the meta data: remove .meta\meta.bin: The process cannot access the file because it is being used by another process. 
INFO[16:10:29] Done                                         
Process finished, exit value: 0
=========================
===
===        Done
===
=========================

Thanks in advance, Remi

chhh commented 7 years ago

@remigs Hi Remi, Try doing rev_sp|ABC123... - i.e. lowercase prefix. There are many tools involved here and the older TPP tools wrapped in Philosopher expect the reverse identifier to be a prefix to the whole string, I guess.

remigs commented 7 years ago

Thanks Dmitry, it worked!

remigs commented 7 years ago

Greetings MSFragger developers, Thanks to your suggestions about fasta file formatting I have now been able to test the software a bit. Below are my comments and and some questions. Just to explain my interest in MSFragger, I am a chemoproteomics person and for my research it is essential to be able to find modifications not defined a priori. From your recent Nat Comms paper I understood that this is doable with MSFragger and I am very enthusiastic about applying this tool routinely in many of my projects. By the way, thanks for developing it!

So far I have set up a couple of open searches in MSFragger GUI using the preconfigured settings (as below)

I searched 3 different mzML files but with limited success. I am impressed by the processing speed of the software but cannot find modified peptides which I know are there. I understand that this may be partially due to sub-optimal parameters I use, and therefore would like to ask experts opinion of which parameters I should be changing with respect to the 3 examples I listed below. Since the program runs so fast, I guess I running multiple searches with changed parameters is a viable option, it this something you would recommend? For the examples used in Nat Comms, have all open searches been run with the same set of parameters?

Each of the 3 test files I had previously searched using other software packages (with exactly the same forward+decoy fasta) and found plenty of modified peptides by simply setting variable modifications corresponding toDmasses, which I introduced to peptides chemically or biochemically.

My samples were:

  1. A lysate obtained from human cells grown in culture in the presence of azidohomoalanine (a methionine surrogate ofDmass-4.99).

Using the preconfigured settings I was able to see 11 instances of thisDmass whereas other software returned over 60 of well assigned spectra. I was able to easily find the PSMs in the output tvs tables, great! In terms of total number of PSMs, MSFragger reported 1.6k and the othersoftware twice as much at FDR 0.01.

I changed Precursor True Tolerance from 20ppm to 5ppm (I fix mass on an internal calibrant during acquisition and rarely observe PSMs with errors >5ppm), but that did not help to bring up the number ofDmass -4.99modifications found.

I then thought I would see if I can boost the number of Dmass-4.99 PSMs by setting this Dmass as variable modification. This is not at all what I would like to do as I mainly care about finding and validating unpredictable modification, but just as an exercise for the software I decided to do it. I tried to run the search with this modification added (as below) but got the message below and the GUI hanged.

Peptide index written in 2293ms
Selected fragment tolerance 0.02 Da and maximum fragment slice size of 10779.80MB
Exception in thread "main" java.lang.reflect.InvocationTargetException
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
                at java.lang.reflect.Method.invoke(Unknown Source)
                at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -248
                at java.util.concurrent.FutureTask.report(Unknown Source)
                at java.util.concurrent.FutureTask.get(Unknown Source)
                at e.b(Unknown Source)
                at MSFragger.main(Unknown Source)
                ... 5 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -248
                at f.call(Unknown Source)
                at java.util.concurrent.FutureTask.run(Unknown Source)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                at java.lang.Thread.run(Unknown Source)
427004992 fragments to be searched in 1 slices (3.18GB total)
Operating on slice 1 of 1:”

Tried three times, all failed. How can this be fixed?

  1. A TMT-modified sample, Dmass +229.16expected on 98%+ ions as verified by previous searches.

Using the preconfigured settings I was able to see 1 instance of thisDmass and only 12 PMSs in total. I then set variable modification Dmass +229.16 as shown below and was happy not to receive any error messages (in contrast to what happened for the Met->AhaDmass in example 1).

This search returned ca 6k PSMs, about 50% of the number returned by other software at FDR 0.01. What surprised me though was that variable modification does not appear as modifications in the histogram output, in other words the top peak is shifted to 0 (as shown below).

Is that always the case? What would this picture look like if 45% of all PSM TMT modified, 49% unmodified, and 6% modified with TMT and Met(ox)?

  1. A lysate obtained from human cells grown in culture in the presence of a lipid that gets metabolically appended to N-terminal glycines on proteins (Dmass+463.2907). This is indeed a search most interesting for me and I was hoping for this one to work.

Using the preconfigured settings I was able to see 2 instances of thisDmass whereas other software returned over 100 of well assigned spectra when set as variable modification. In terms of total number of PSMs, MSFragger reported ca 6k and the other software ca 9k at FDR 0.01. I am yet to overlay protein and peptide IDs but looking briefly at Protein IDs I can see that MSFragger did a good job finding proteins that should be there based on the sample specification.

The fact of MSFragger being able to find less than 2% of Dmass +463.2907 PSMsis alarming. Could it be that the software does not operate correctly near the search window frame boarders? The default search limits are Dmasses+/-500Da and my modification is +463.2907,and if combined with cysteine carbamidomethylation or Met(ox) the total Dmass would fall beyond the preset window.

I decided therefore to increase the Precursor Mass Parameter to 1000 ABS

and got in very mild increase of Dmass+463.2907 PSMs, I could see 3 instead of 2 instances.
Total number of PSMs did not change. Furthermore, I was not able to inspect data for PSMs associated with Dmass >500 as neither the histogram output nor the tsv tables contain information aboutDmass >500. Have I set the search properly? How can I see information aboutDmass >500?

I then added Dmass +463.2907 to variable modifications just to see if the software improves on detection of these peptides and this resulted in an interesting observation.

There was only one Dmass +463.2907 PSM reported BUT over 40 of Dmass -463.2907 PSMs (as shown below).

Fair enough, some bug I thought that may be fixable? In any case, I do not mind looking on the other side of the x-axis, but the problem was bigger than I initially realised. Even though I could see on output histograms modification I expected to see I could not find the corresponding IDs in the output tables. I double search all of them and found no indication of my Dmasses. It seems that the only PSMs listed are unmodified ones and those, for whichDmassescould be matched to their descriptions. How can I access information about the PMSs with unnamed Dmasses? Furthermore, this example illustrates that using pre-set open search settings theDmass +463.2907 PSM cannot be efficiently identified by MSFragger without setting it as variable modification. A couple of my project deal with identifying protein adducts of similar type and mass. Please advise on settings to improve detection of these PTMs.

Lastly (a long shot question, and I will gladly accept a simple no to that one), I wanted to ask whether diagnostic fragment ion feature would be easily implementable in MSFragger and if possibly already available for testing? To explain the application, I would say that if a small molecule that modifies proteins (either enzymatically or chemically) has a metabolically stable region of its structure, and if that structural element happens to produce characteristic HCD fragmentation product(s), and at the same time the molecule has heavily metabolizable structural regions, the characteristic HCD ion(s) can be used to aid identifying protein adducts of a spectrum of metabolites derived from that small molecule. Please let me know if you would like to know more about it.

Sorry for the length of this post but I tried to explain as plainly as possible the issues I encounter and hope to hear from you and be able to use the full potential of MSFragger soon. Best regards, Remi

remigs commented 7 years ago

Sorry the pictures did not copy at all and some elements of text did copy properly, I working from iPad, will try to add them from a PCs when I get access to one.

remigs commented 7 years ago

the attached pdf file contains all the screenshots I refer to in my post observations about fragger.pdf

trayambakbasak commented 6 years ago

HI, could anyone please share a instruction for running the MSFragger GUI 3.0 I have installed but could not get the MSFragger.jar file and thus could not run the GUI to test.

Thanks, Trayambak.

andytyk commented 6 years ago

@remigs Sorry, it seems that your issue was neglected amidst all the other posts. Once something is specified as a variable modification, then it no longer appears as a delta mass and will show up as a 0 delta mass PSM with the variable mod. Delta mass only refers to masses that are not accounted for using variable modifications.

Open searching is still an emerging method so there's no catch-all set of parameters that we can recommend. We've had others who had great success in using MSFragger for chemoprotoemics so if you're interested in debugging with us to get MSFragger working for your chemistry, just send me an e-mail at andykong at umich.edu with your data so we can take a look together.

@trayambakbasak There should be a link in the GUI for downloading MSFragger.jar from our Tech Transfer site.