Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
208 stars 38 forks source link

No Protein groups detected, check your file and try again #1

Closed videlc closed 7 years ago

videlc commented 7 years ago

Hey,

Gave a try to MSFragger GUI with std configuration with open search + philosopher. Everything is fine until grouping. I am using a concatenated target/decoy from uniprot (canonical + isof : swiss + trembl- 03/2017) and my decoys are flagged with the "rev_" tag. Is there something to change to the FASTA ?

INFO[21:01:10] Processing peptide identification files      

INFO[21:01:13] 1+ Charge profile                             decoy=0 target=0
INFO[21:01:13] 2+ Charge profile                             decoy=2019 target=18584
INFO[21:01:13] 3+ Charge profile                             decoy=2386 target=15240
INFO[21:01:13] 4+ Charge profile                             decoy=815 target=4289
INFO[21:01:13] 5+ Charge profile                             decoy=188 target=950
INFO[21:01:13] 6+ Charge profile                             decoy=94 target=348
INFO[21:01:13] Database search results                       ions=31664 peptides=29682 psms=45464

INFO[21:01:14] Converged to 0.00 % FDR with 45464 PSMs       decoy=0 threshold=0 total=45464

INFO[21:01:14] Converged to 0.00 % FDR with 29682 Peptides   decoy=0 threshold=0 total=29682
INFO[21:01:14] Converged to 0.00 % FDR with 31664 Ions       decoy=0 threshold=0 total=31664

FATA[21:01:14] No Protein groups detected, check your file and try again 

Process finished, exit value: 1

Cheers, Vivian

chhh commented 7 years ago

Hi @videlc, I'll forward your question to the person responsible for Philosopher development. In the meantime, could you please post more output from before this last stage where it fails?

Most likely the problem is with annotations in the FASTA file. Could you put the database you used somewhere online and send me a link? Or maybe just attach a zipped copy to a response here (file size permitting).

videlc commented 7 years ago

Hi @chhh, Thank you for your answer.

Here is the complete MSFragger GUI log :

Executing command:
$> java -jar -Xmx8G C:\Users\delv1901\Documents\MSFragger_20170103\MSFragger.jar C:\Users\delv1901\Documents\Data\20160704_altmid_mid_gfp\fragger.params C:\Users\delv1901\Documents\Data\20160704_altmid_mid_gfp\276vivian_ALG.mgf 
Process started
Peptide index read in 891ms

Selected fragment tolerance 0,02 Da and maximum fragment slice size of 4955,80MB
416196452 fragments to be searched in 1 slices (3,10GB total)
Operating on slice 1 of 1: 
13735ms
    276vivian_ALG.mgf 
4953ms

    276vivian_ALG.mgf 4953ms [progress: 3593/104648 (3,43%) - 714,17 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 6021/104648 (5,75%) - 484,05 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 8611/104648 (8,23%) - 508,44 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 11002/104648 (10,51%) - 473,75 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 13279/104648 (12,69%) - 445,60 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 15460/104648 (14,77%) - 434,81 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 17371/104648 (16,60%) - 375,15 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 19308/104648 (18,45%) - 386,16 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 21142/104648 (20,20%) - 362,24 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 23001/104648 (21,98%) - 367,25 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 24739/104648 (23,64%) - 341,19 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 26533/104648 (25,35%) - 358,73 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 27997/104648 (26,75%) - 291,87 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 29803/104648 (28,48%) - 350,20 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 31578/104648 (30,18%) - 351,69 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 33218/104648 (31,74%) - 321,00 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 34814/104648 (33,27%) - 314,24 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 36350/104648 (34,74%) - 298,77 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 37927/104648 (36,24%) - 310,56 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 39361/104648 (37,61%) - 285,03 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 40772/104648 (38,96%) - 276,99 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 42393/104648 (40,51%) - 320,17 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 43950/104648 (42,00%) - 307,53 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 45366/104648 (43,35%) - 278,85 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 46734/104648 (44,66%) - 272,73 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 48292/104648 (46,15%) - 304,89 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 49765/104648 (47,55%) - 286,52 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 51288/104648 (49,01%) - 297,17 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 52665/104648 (50,33%) - 268,68 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 54079/104648 (51,68%) - 276,71 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 55556/104648 (53,09%) - 289,10 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 57055/104648 (54,52%) - 296,07 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 58582/104648 (55,98%) - 304,43 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 59955/104648 (57,29%) - 272,04 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 61282/104648 (58,56%) - 263,71 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 62773/104648 (59,98%) - 297,25 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 64262/104648 (61,41%) - 295,03 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 65851/104648 (62,93%) - 311,94 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 67515/104648 (64,52%) - 331,74 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 69200/104648 (66,13%) - 332,87 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 70872/104648 (67,72%) - 332,27 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 72778/104648 (69,55%) - 375,34 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 74773/104648 (71,45%) - 389,27 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 76672/104648 (73,27%) - 375,07 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 78745/104648 (75,25%) - 409,44 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 80945/104648 (77,35%) - 437,29 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 83154/104648 (79,46%) - 436,30 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 85962/104648 (82,14%) - 556,37 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 88942/104648 (84,99%) - 586,73 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 92779/104648 (88,66%) - 748,68 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 98459/104648 (94,09%) - 1132,38 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 103319/104648 (98,73%) - 957,07 spectra/s]

    276vivian_ALG.mgf 4953ms [progress: 104648/104648 (100,00%) - 366,62 spectra/s]
 - completed 267357ms

Process finished, exit value: 0

Executing command:
$> C:\Users\delv1901\Documents\MSFragger-GUI_v2.6\philosopher_windows_amd64.exe workspace --init 
Process started
INFO[09:08:36] Creating workspace                           
WARN[09:08:36] existing workspace detected, will not overwrite 
INFO[09:08:36] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Users\delv1901\Documents\MSFragger-GUI_v2.6\philosopher_windows_amd64.exe peptideprophet --nonparam --expectscore --decoy rev --decoyprobs --masswidth 1000.0 --clevel -2 --database C:\Users\delv1901\Documents\FASTA\uniprot_hs_03_2017_GST_reverse_decoy.fasta C:\Users\delv1901\Documents\Data\20160704_altmid_mid_gfp\276vivian_ALG.pepXML 
Process started
 file 1: C:\Users\delv1901\Documents\Data\20160704_altmid_mid_gfp\276vivian_ALG.pepXML

 processed altogether 22781 results

INFO: Results written to file: C:\Users\delv1901\Documents\Data\20160704_altmid_mid_gfp\interact-276vivian_ALG.pep.xml

  - C:\Users\delv1901\Documents\Data\20160704_altmid_mid_gfp\interact-276vivian_ALG.pep.xml

  - Building Commentz-Walter keyword tree...

  - Searching the tree...
  - Linking duplicate entries...
  - Printing results...

Using Decoy Label "rev".
Decoy Probabilities will be reported.
Using non-parametric distributions
 (X! Tandem) (using Tandem's expectation score for modeling)

init with X! Tandem trypsin 

 PeptideProphet  (TPP v5.0.1 Post-Typhoon dev, Build 201705191533-7588 (Windows_NT-x86_64)) AKeller@ISB
 read in 0 1+, 10110 2+, 9045 3+, 2620 4+, 564 5+, 229 6+, and 0 7+ spectra.
Found 0 Decoys, and 22568 Non-Decoys
WARNING: No decoys with label rev were found in this dataset. reverting to fully unsupervised method.
negmean = 0.0533258

MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN

INFO: Processing standard MixtureModel ... 
Initialising statistical models ...
INFO[09:08:54] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Users\delv1901\Documents\MSFragger-GUI_v2.6\philosopher_windows_amd64.exe workspace --clean 
Process started
INFO[09:08:54] Removing workspace                           
WARN[09:08:54] cannot remove the meta data: remove .meta\meta.bin: The process cannot access the file because it is being used by another process. 
INFO[09:08:54] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Users\delv1901\Documents\MSFragger-GUI_v2.6\philosopher_windows_amd64.exe workspace --init 
Process started
INFO[09:08:54] Creating workspace                           
WARN[09:08:54] existing workspace detected, will not overwrite 
INFO[09:08:54] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Users\delv1901\Documents\MSFragger-GUI_v2.6\philosopher_windows_amd64.exe proteinprophet --output interact --maxppmdiff 20.0 interact-276vivian_ALG.pep.xml 
Process started
ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v5.0.1 Post-Typhoon dev, Build 201705191533-7588 (Windows_NT-x86_64))
 (no FPKM) (using degen pep info)
Reading in C:/Users/delv1901/Documents/Data/20160704_altmid_mid_gfp/interact-276vivian_ALG.pep.xml...

did not find any PeptideProphet results in input data!  Did you forget to run PeptideProphet?
...read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, 0 7+ spectra with min prob 0

WARNING: no data - output file will be empty

INFO[09:08:58] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Users\delv1901\Documents\MSFragger-GUI_v2.6\philosopher_windows_amd64.exe workspace --clean 
Process started
INFO[09:08:58] Removing workspace                           
WARN[09:08:58] cannot remove the meta data: remove .meta\meta.bin: The process cannot access the file because it is being used by another process. 
INFO[09:08:58] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Users\delv1901\Documents\MSFragger-GUI_v2.6\philosopher_windows_amd64.exe workspace --init 
Process started
INFO[09:08:58] Creating workspace                           
WARN[09:08:58] existing workspace detected, will not overwrite 
INFO[09:08:58] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Users\delv1901\Documents\MSFragger-GUI_v2.6\philosopher_windows_amd64.exe database --annotate C:\Users\delv1901\Documents\FASTA\uniprot_hs_03_2017_GST_reverse_decoy.fasta 
Process started
INFO[09:08:58] Processing database                          

INFO[09:09:30] Done                                         

Process finished, exit value: 0

Executing command:
$> C:\Users\delv1901\Documents\MSFragger-GUI_v2.6\philosopher_windows_amd64.exe filter --mapmods --sequential --pepxml C:\Users\delv1901\Documents\Data\20160704_altmid_mid_gfp --protxml C:\Users\delv1901\Documents\Data\20160704_altmid_mid_gfp\interact.prot.xml 
Process started
INFO[09:09:30] Processing peptide identification files      

INFO[09:09:34] 1+ Charge profile                             decoy=0 target=0
INFO[09:09:34] 2+ Charge profile                             decoy=2019 target=18584
INFO[09:09:34] 3+ Charge profile                             decoy=2386 target=15240
INFO[09:09:34] 4+ Charge profile                             decoy=815 target=4289
INFO[09:09:34] 5+ Charge profile                             decoy=188 target=950
INFO[09:09:34] 6+ Charge profile                             decoy=94 target=348
INFO[09:09:34] Database search results                       ions=31664 peptides=29682 psms=45464

INFO[09:09:34] Converged to 0.00 % FDR with 45464 PSMs       decoy=0 threshold=0 total=45464

INFO[09:09:34] Converged to 0.00 % FDR with 29682 Peptides   decoy=0 threshold=0 total=29682

INFO[09:09:35] Converged to 0.00 % FDR with 31664 Ions       decoy=0 threshold=0 total=31664

FATA[09:09:35] No Protein groups detected, check your file and try again 

Process finished, exit value: 1

FASTA will be sent to you via link.

Best regards, Vivian

videlc commented 7 years ago

Here is the FASTA file https://www.dropbox.com/s/1i4qff22v5f6p3j/uniprot_hs_03_2017_GST_reverse_decoy.fasta?dl=0 .

chhh commented 7 years ago

Thanks for the log and FASTA @videlc!

It looks like PeptideProphet could not find any decoy hits, then it reverted to fully automated mode and failed silently:

PeptideProphet  (TPP v5.0.1 Post-Typhoon dev, Build 201705191533-7588 (Windows_NT-x86_64)) AKeller@ISB
 read in 0 1+, 10110 2+, 9045 3+, 2620 4+, 564 5+, 229 6+, and 0 7+ spectra.
Found 0 Decoys, and 22568 Non-Decoys
WARNING: No decoys with label rev were found in this dataset. reverting to fully unsupervised method.
...
INFO: Processing standard MixtureModel ... 
Initialising statistical models ...
INFO[09:08:54] Done                                         

Process finished, exit value: 0

So ProteinProphet didn't find any PeptideProphet results and didn't do anything:

ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v5.0.1 Post-Typhoon dev, Build 201705191533-7588 (Windows_NT-x86_64))
 (no FPKM) (using degen pep info)
Reading in C:/Users/delv1901/Documents/Data/20160704_altmid_mid_gfp/interact-276vivian_ALG.pep.xml...

did not find any PeptideProphet results in input data!  Did you forget to run PeptideProphet?
...read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, 0 7+ spectra with min prob 0

WARNING: no data - output file will be empty
chhh commented 7 years ago

@videlc would you mind sharing 276vivian_ALG.mgf as well, so we could look into the issue? It's strange that using open search you got 20k forward hits and zero decoys.

videlc commented 7 years ago

Yes, looks like there might be a problem there. However, the "second search" was able to find decoys, that's why I thought it was OK.

INFO[09:09:34] 1+ Charge profile                             decoy=0 target=0
INFO[09:09:34] 2+ Charge profile                             decoy=2019 target=18584
INFO[09:09:34] 3+ Charge profile                             decoy=2386 target=15240
INFO[09:09:34] 4+ Charge profile                             decoy=815 target=4289
INFO[09:09:34] 5+ Charge profile                             decoy=188 target=950
INFO[09:09:34] 6+ Charge profile                             decoy=94 target=348

MS file link will be sent to you via email. Vivian

chhh commented 7 years ago

@videlc, Felipe (the person developling Philosopher) tells me that in your FASTA file the sequences are marked as reverse "incorrectly". You have this: tr|rev_A0A024QYW1|A0A024QYW1_HUMAN Isoform of A6NGB0, Transmembrane protein 191C OS=Homo sapiens GN=DKFZp434N035 PE=4 SV=1 and it "should" be this: rev_tr|A0A024QYW1|A0A024QYW1_HUMAN Isoform of A6NGB0, Transmembrane protein 191C OS=Homo sapiens GN=DKFZp434N035 PE=4 SV=1 Notice that the rev modifier moved from protein accession to the front of the whole description string.

chhh commented 7 years ago

@videlc What tool did you use to generate the DB with reverse-protein decoys?

videlc commented 7 years ago

Oh, i thought rev should have been placed before accession (I usually saw DECOY or REVERSE) so I thought it was the way to go.

Tool I used it this : https://www.ruhr-uni-bochum.de/mpc/software/DecoyBuilder/index.html.en It generates concatenated target/decoy fastas from target only fastas. Decoy tag is added before the accession (after "|"). I replaced it to "rev_" sothat would meet GUI expectations.

videlc commented 7 years ago

Editing FASTA fixed the issue. With which tool should I have generated the FASTA to avoid this problem ?

chhh commented 7 years ago

@videlc Philosopher has a philospher.exe database ... command which provides some tools to create those databases and append contaminants. TPP comes with Perl scripts for decoy database generation. OpenMS also has a tool, but the format will be incompatible with PeptideProphet/ProteinProphet :(

We're working on better support for database generation.

videlc commented 7 years ago

Thank you @chhh for the tips and quick support ! Will continue my MSFragger GUI exploration