Nesvilab / philosopher

PeptideProphet, PTMProphet, ProteinProphet, iProphet, Abacus, and FDR filtering
https://philosopher.nesvilab.org
GNU General Public License v3.0
109 stars 18 forks source link

Help needed. Empty PSM list stops pipeline. #328

Closed fstein closed 2 years ago

fstein commented 2 years ago

Hello,

I wanted to analyze a simple label-free experiment (it's after an acid hydrolysis of a gel band). Unfortunately, the philosopher pipeline does not finish, since the PSM list is empty. Analyzing this experiment with IsobarQuant or MaxQuant leads to the identification of one desired protein. What did I do wrong? Readout of the ms2 was done in the ion trap. Therefore, I set the fragment_mass_tolerance to 0.5. Also, the search needs to be unspecific due to unspecific hydrolysis of the protein). Did I set any wrong parameter here? Please find below the philosopher output as well as my philosopher.yml file.

Thanks already a lot for you help.

Best,

Frank philosopher.yml.txt

fstein commented 2 years ago

Here is the output: philosopher pipeline --config C:\MS_TestRun_gelband/philosopher.yml C:\MS_TestRun_gelband INFO[10:31:57] Executing Pipeline v4.1.1 INFO[10:31:57] Creating workspace WARN[10:31:57] A meta data folder was found and will not be overwritten. INFO[10:31:57] Initiating the workspace on C:\MS_TestRun_gelband INFO[10:31:57] Creating workspace WARN[10:31:57] A meta data folder was found and will not be overwritten. INFO[10:31:57] Annotating the database INFO[10:31:57] Running the Database Search MSFragger version MSFragger-3.4 Batmass-IO version 1.23.6 timsdata library version timsdata-2-8-7-1 (c) University of Michigan RawFileReader reading tool. Copyright (c) 2016 by Thermo Fisher Scientific, Inc. All rights reserved. System OS: Windows 10, Architecture: AMD64 Java Info: 1.8.0_201, Java HotSpot(TM) 64-Bit Server VM, Oracle Corporation JVM started with 35 GB memory Checking database... Parameter 'search_enzyme_cutafter' was not supplied. Using default value: KR Parameter 'search_enzyme_butnotafter' was not supplied. Using default value: Parameter 'search_enzyme_name' was not supplied. Using default value: stricttrypsin Deisotoping doesn't support low resolution tandem mass spectra. Changing deisotope to 0. deisotope = 0. Changing deneutralloss to 0. Checking spectral files... C:\MS_TestRun_gelband\Zelda_220225_P1990_PH_JS_band01_H_R1.mzML: Scans = 18140 ***FIRST SEARCH**** Parameters: num_threads = 6 database_name = C:\MS_TestRun_gelband\2022-03-10-decoys-contam-Ecoli_UP000000625_05142016_4314entries.fasta.fas decoyprefix = rev precursor_mass_lower = -20.0 precursor_mass_upper = 20.0 precursor_mass_units = 1 data_type = 0 precursor_true_tolerance = 20.0 precursor_true_units = 1 fragment_mass_tolerance = 500.0 fragment_mass_units = 1 calibrate_mass = 2 use_all_mods_in_first_search = false write_calibrated_mgf = 0 isotope_error = 0 mass_offsets = 0 labile_search_mode = OFF restrict_deltamass_to = all precursor_mass_mode = SELECTED localize_delta_mass = false delta_mass_exclude_ranges = (-1.5,3.5) fragment_ion_series = b,y ion_series_definitions = search_enzyme_name = stricttrypsin search_enzyme_sense_1 = C search_enzyme_cut_1 = KR search_enzyme_nocut_1 = allowed_missed_cleavage_1 = 2 num_enzyme_termini = 0 clip_nTerm_M = true allow_multiple_variable_mods_on_residue = false max_variable_mods_per_peptide = 3 max_variable_mods_combinations = 5000 output_format = tsv_pepxml_pin output_report_topN = 1 output_max_expect = 50.0 report_alternative_proteins = false override_charge = false precursor_charge_low = 1 precursor_charge_high = 4 digest_min_length = 8 digest_max_length = 15 digest_mass_range_low = 500.0 digest_mass_range_high = 5000.0 max_fragment_charge = 2 deisotope = 0 deneutralloss = false track_zero_topN = 0 zero_bin_accept_expect = 0.0 zero_bin_mult_expect = 1.0 add_topN_complementary = 0 minimum_peaks = 10 use_topN_peaks = 150 minIonsScoring = 2 min_matched_fragments = 4 minimum_ratio = 0.01 intensity_transform = 0 remove_precursor_peak = 0 remove_precursor_range = -1.5,1.5 clear_mz_range_low = 0.0 clear_mz_range_high = 0.0 excluded_scan_list_file = mass_diff_to_variable_mod = 0 min_sequence_matches = 2 check_spectral_files = true variable_mod_02 = 42.01060 [^ 1 add_A_alanine = 0.000000 add_C_cysteine = 57.021464 add_Cterm_peptide = 0.0 add_Cterm_protein = 0.0 add_D_aspartic_acid = 0.000000 add_E_glutamic_acid = 0.000000 add_F_phenylalanine = 0.000000 add_G_glycine = 0.000000 add_H_histidine = 0.000000 add_I_isoleucine = 0.000000 add_K_lysine = 0.000000 add_L_leucine = 0.000000 add_M_methionine = 0.000000 add_N_asparagine = 0.000000 add_Nterm_peptide = 0.0 add_Nterm_protein = 0.0 add_P_proline = 0.000000 add_Q_glutamine = 0.000000 add_R_arginine = 0.000000 add_S_serine = 0.000000 add_T_threonine = 0.000000 add_V_valine = 0.000000 add_W_tryptophan = 0.000000 add_Y_tyrosine = 0.000000 Selected fragment index width 2.50 Da. 446665730 fragments to be searched in 1 slices (6.66 GB total) Operating on slice 1 of 1: Fragment index slice generated in 4.34 s

  1. Zelda_220225_P1990_PH_JS_band01_H_R1.mzML 1.4 s [progress: 17628/17628 (100%) - 3773 spectra/s] 4.7s | postprocessing 0.1 s ***FIRST SEARCH DONE IN 0.230 MIN**
**MASS CALIBRATION AND PARAMETER OPTIMIZATION ----- --------------- --------------- --------------- --------------- MS1 (Old) MS1 (New) MS2 (Old) MS2 (New)
Run Median MAD Median MAD Median MAD Median MAD
001 -1.95 0.72 -0.24 0.79 8.00 69.85 -3.52 67.43
----- --------------- --------------- --------------- ---------------
Finding the optimal parameters: ------- ------- ------- MS2 200 300
Count 85 73
------- ------- -------
------- ------- ------- -------
Peaks 150_1 100_1 75_1
------- ------- ------- -------
Count 85 27 skip rest
------- ------- ------- -------
------- -------
Int. 1
------- -------
Count 75
------- -------
------- -------
Rm P. 1
------- -------
Count 78
------- -------
------- -------
FragChg 1
------- -------
Count 119
------- -------

New fragment_mass_tolerance = 200 PPM New use_topN_peaks = 150 New minimum_ratio = 0.010000 New intensity_transform = 0 New remove_precursor_peak = 0 New max_fragment_charge = 1 ****MASS CALIBRATION AND PARAMETER OPTIMIZATION DONE IN 0.934 MIN*****

****MAIN SEARCH**** output_format = tsv_pepXML_pin but report_alternative_proteins = 0. Change report_alternative_proteins to 1. Checking database... Parameter 'search_enzyme_cutafter' was not supplied. Using default value: KR Parameter 'search_enzyme_butnotafter' was not supplied. Using default value: Parameter 'allowed_missed_cleavage' was not supplied. Using default value: 2 Parameter 'search_enzyme_name' was not supplied. Using default value: stricttrypsin Deisotoping doesn't support low resolution tandem mass spectra. Changing deisotope to 0. deisotope = 0. Changing deneutralloss to 0. variable_mod_03 has an empty value. variable_mod_04 has an empty value. variable_mod_05 has an empty value. variable_mod_06 has an empty value. variable_mod_07 has an empty value. Parameters: num_threads = 6 database_name = C:\MS_TestRun_gelband\2022-03-10-decoys-contam-Ecoli_UP000000625_05142016_4314entries.fasta.fas decoyprefix = rev precursor_mass_lower = -20.0 precursor_mass_upper = 20.0 precursor_mass_units = 1 data_type = 0 precursor_true_tolerance = 20.0 precursor_true_units = 1 fragment_mass_tolerance = 200.0 fragment_mass_units = 1 calibrate_mass = 2 use_all_mods_in_first_search = false write_calibrated_mgf = 0 isotope_error = 0/1/2 mass_offsets = 0 labile_search_mode = OFF restrict_deltamass_to = all precursor_mass_mode = SELECTED localize_delta_mass = false delta_mass_exclude_ranges = (-1.5,3.5) fragment_ion_series = b,y ion_series_definitions = search_enzyme_name = stricttrypsin search_enzyme_sense_1 = C search_enzyme_cut_1 = KR search_enzyme_nocut_1 = allowed_missed_cleavage_1 = 2 num_enzyme_termini = 0 clip_nTerm_M = true allow_multiple_variable_mods_on_residue = false max_variable_mods_per_peptide = 3 max_variable_mods_combinations = 5000 output_format = tsv_pepxml_pin output_report_topN = 1 output_max_expect = 50.0 report_alternative_proteins = true override_charge = false precursor_charge_low = 1 precursor_charge_high = 4 digest_min_length = 7 digest_max_length = 50 digest_mass_range_low = 500.0 digest_mass_range_high = 5000.0 max_fragment_charge = 2 deisotope = 0 deneutralloss = false track_zero_topN = 0 zero_bin_accept_expect = 0.0 zero_bin_mult_expect = 1.0 add_topN_complementary = 0 minimum_peaks = 10 use_topN_peaks = 150 minIonsScoring = 2 min_matched_fragments = 4 minimum_ratio = 0.01 intensity_transform = 0 remove_precursor_peak = 0 remove_precursor_range = -1.5,1.5 clear_mz_range_low = 0.0 clear_mz_range_high = 0.0 excluded_scan_list_file = mass_diff_to_variable_mod = 0 min_sequence_matches = 2 check_spectral_files = true variable_mod_01 = 15.99490 M 3 variable_mod_02 = 42.01060 [^ 1 add_A_alanine = 0.000000 add_C_cysteine = 57.021464 add_Cterm_peptide = 0.0 add_Cterm_protein = 0.0 add_D_aspartic_acid = 0.000000 add_E_glutamic_acid = 0.000000 add_F_phenylalanine = 0.000000 add_G_glycine = 0.000000 add_H_histidine = 0.000000 add_I_isoleucine = 0.000000 add_K_lysine = 0.000000 add_L_leucine = 0.000000 add_M_methionine = 0.000000 add_N_asparagine = 0.000000 add_Nterm_peptide = 0.0 add_Nterm_protein = 0.0 add_P_proline = 0.000000 add_Q_glutamine = 0.000000 add_R_arginine = 0.000000 add_S_serine = 0.000000 add_T_threonine = 0.000000 add_V_valine = 0.000000 add_W_tryptophan = 0.000000 add_Y_tyrosine = 0.000000 Selected fragment index width 1.00 Da. 11115526066 fragments to be searched in 7 slices (165.63 GB total) Operating on slice 1 of 7: Fragment index slice generated in 24.63 s

  1. Zelda_220225_P1990_PH_JS_band01_H_R1.mzBIN_calibrated 0.4 s [progress: 17620/17620 (100%) - 1932 spectra/s] 9.1s Operating on slice 2 of 7: Fragment index slice generated in 15.16 s
  2. Zelda_220225_P1990_PH_JS_band01_H_R1.mzBIN_calibrated 0.4 s [progress: 17620/17620 (100%) - 4174 spectra/s] 4.2s Operating on slice 3 of 7: Fragment index slice generated in 17.92 s
  3. Zelda_220225_P1990_PH_JS_band01_H_R1.mzBIN_calibrated 0.3 s [progress: 17620/17620 (100%) - 6340 spectra/s] 2.8s Operating on slice 4 of 7: Fragment index slice generated in 14.01 s
  4. Zelda_220225_P1990_PH_JS_band01_H_R1.mzBIN_calibrated 0.3 s [progress: 17620/17620 (100%) - 7948 spectra/s] 2.2s Operating on slice 5 of 7: Fragment index slice generated in 14.51 s
  5. Zelda_220225_P1990_PH_JS_band01_H_R1.mzBIN_calibrated 0.3 s [progress: 17620/17620 (100%) - 8375 spectra/s] 2.1s Operating on slice 6 of 7: Fragment index slice generated in 14.32 s
  6. Zelda_220225_P1990_PH_JS_band01_H_R1.mzBIN_calibrated 0.4 s [progress: 17620/17620 (100%) - 8806 spectra/s] 2.0s Operating on slice 7 of 7: Fragment index slice generated in 14.04 s
  7. Zelda_220225_P1990_PH_JS_band01_H_R1.mzBIN_calibrated 0.3 s [progress: 17620/17620 (100%) - 8347 spectra/s] 2.1s | postprocessing 2.5 s MAIN SEARCH DONE IN 2.800 MIN

***TOTAL TIME 3.965 MIN**** INFO[10:35:59] Running the validation and inference on C:\MS_TestRun_gelband INFO[10:35:59] Executing PeptideProphet on C:\MS_TestRun_gelband file 1: C:\MS_TestRun_gelband\Zelda_220225_P1990_PH_JS_band01_H_R1.pepXML processed altogether 14958 results INFO: Results written to file: C:\MS_TestRun_gelband\interact.pep.xml

using Accurate Mass Bins using PPM mass difference Using Decoy Label "rev_". Decoy Probabilities will be reported. Using non-parametric distributions (X! Tandem) (using Tandem's expectation score for modeling) adding ACCMASS mixture distribution using search_offsets in ACCMASS mixture distr: 0 init with X! Tandem stricttrypsin MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN

INFO: Processing standard MixtureModel ... PeptideProphet (TPP v5.2.1-dev Flammagenitus, Build 201906281613-exported (Windows_NT-x86_64)) AKeller@ISB read in 0 1+, 5326 2+, 6021 3+, 2785 4+, 617 5+, 209 6+, and 0 7+ spectra. Initialising statistical models ... Found 6688 Decoys, and 8270 Non-Decoys Iterations: .........10.........20.........30 WARNING: Mixture model quality test failed for charge (1+). WARNING: Mixture model quality test failed for charge (5+). WARNING: Mixture model quality test failed for charge (7+). model complete after 31 iterations INFO[10:36:22] Running the validation and inference on C:\MS_TestRun_gelband INFO[10:36:22] Executing ProteinProphet on C:\MS_TestRun_gelband ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v6.0.0-rc15 Noctilucent, Build 202105101442-exported (Windows_NT-x86_64)) (no FPKM) (using degen pep info) Reading in C:/MS_TestRun_gelband/interact.pep.xml... ...read in 0 1+, 750 2+, 396 3+, 119 4+, 0 5+, 4 6+, 0 7+ spectra with min prob 0.05

Initializing 1008 peptide weights: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100% Calculating protein lengths and molecular weights from database c:/MS_TestRun_gelband/2022-03-10-decoys-contam-Ecoli_UP000000625_05142016_4314entries.fasta.fasotal: 8862 Computing degenerate peptides for 178 proteins: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100% Computing probabilities for 179 proteins. Loop 1: 0%...20%...40%...60%...80%...100% Loop 2: 0%...20%...40%...60%...80%...100% Computing probabilities for 179 proteins. Loop 1: 0%...20%...40%...60%...80%...100% Loop 2: 0%...20%...40%...60%...80%...100% Computing 175 protein groups: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100% Calculating sensitivity...and error tables... Computing MU for 179 proteins: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100% INFO: mu=5.05591e-05, db_size=5598271 INFO[10:36:23] Executing filter on C:\MS_TestRun_gelband INFO[10:36:23] Processing peptide identification files INFO[10:36:23] Printing models INFO[10:36:26] 1+ Charge profile decoy=0 target=0 INFO[10:36:26] 2+ Charge profile decoy=72 target=678 INFO[10:36:26] 3+ Charge profile decoy=46 target=350 INFO[10:36:26] 4+ Charge profile decoy=13 target=106 INFO[10:36:26] 5+ Charge profile decoy=0 target=0 INFO[10:36:26] 6+ Charge profile decoy=0 target=4 INFO[10:36:26] Database search results ions=1008 peptides=830 psms=1269 INFO[10:36:26] Converged to 0.98 % FDR with 405 PSMs decoy=4 threshold=0.98 total=409 INFO[10:36:26] Converged to 0.73 % FDR with 271 Peptides decoy=2 threshold=0.9865 total=273 INFO[10:36:26] Converged to 0.92 % FDR with 316 Ions decoy=2 threshold=0.9813 total=318 INFO[10:36:26] Protein inference results decoy=89 target=86 INFO[10:36:26] Converged to 50.00 % FDR with 2 Proteins decoy=1 threshold=0.9344 total=3 INFO[10:36:26] Applying sequential FDR estimation ions=328 peptides=298 psms=405 INFO[10:36:26] Converged to 0.00 % FDR with 405 PSMs decoy=0 threshold=0.98 total=405 INFO[10:36:26] Converged to 0.00 % FDR with 298 Peptides decoy=0 threshold=0.9801 total=298 INFO[10:36:26] Converged to 0.00 % FDR with 328 Ions decoy=0 threshold=0.9801 total=328 INFO[10:36:27] Post processing identifications INFO[10:36:27] Mapping modifications INFO[10:36:27] Assigning protein identifications to layers INFO[10:36:27] Processing protein inference INFO[10:36:27] Synchronizing PSMs and proteins INFO[10:36:27] Total report numbers after FDR filtering, and post-processing ions=0 peptides=0 proteins=0 psms=0 INFO[10:36:27] Saving INFO[10:36:27] Executing label-free quantification on C:\MS_TestRun_gelband INFO[10:36:27] Indexing PSM information INFO[10:36:27] Reading spectra and tracing peaks INFO[10:36:27] Assigning intensities to data layers FATA[10:36:27] Cannot quantify data set. the PSM list is enpty

fstein commented 2 years ago

Also, why does it say "Parameter 'search_enzyme_name' was not supplied. Using default value: stricttrypsin"? In the parameter.yml file there is only 'search_enzyme_name_1' which I gave the value 'nonspecific'. This seems to be not used. Should I still have a parameter called 'search_enzyme_name' although it's not part of the original parameter.yml file? Is this maybe the issue that it does not do a nonspecific search?

prvst commented 2 years ago

@fstein I suggest you look at the FragPipe workflows. The program has automatic configuration for several scenarios, including nonspecific searches. The fragger message might be related to the fact that the program was updated, and you are still using a configuration file from a previous version. Try generating a new parameter file before running again.

fstein commented 2 years ago

@prvst I am using the latest version of both philosopher, msfragger 3.4 and also the philosopher.yml file. This output is coming after using the pipeline command.

prvst commented 2 years ago

Thanks, I'll take a look. On a side note, we will push forward the option of running a cmd version of fragpipe in the future. You can look at the program now to get familiar with it.

fstein commented 2 years ago

Having a cmd version of fragpipe would be extremely useful for us.

I already tried to save the MSFragger details for a nonspecific search in a params file and making sure the parameters match the ones I use in the philosopher.yml file. Did not work :-(

prvst commented 2 years ago

@fcyu, can we send him the beta for testing?

fcyu commented 2 years ago

Sure.

@fstein , here (https://www.dropbox.com/s/rppc134jznqvonj/FragPipe-17.2-build27.zip?dl=1) is the link to download the pre-release version.

Best,

Fengchao

fstein commented 2 years ago

Thanks a lot...

prvst commented 2 years ago

Please submit new tickets to the FragPipe github in case you need help.

fstein commented 2 years ago

Is there some kind of documentation how to use it from the command line? I will comment on the FragPipe github page for future questions.

fcyu commented 2 years ago

Here (https://github.com/Nesvilab/FragPipe/issues/560#issuecomment-999748399) has a brief documentation.

fstein commented 2 years ago

Thanks, I'll take a look. On a side note, we will push forward the option of running a cmd version of fragpipe in the future. You can look at the program now to get familiar with it.

So, although you just closed this thread, would be still important for us to be able to analyze acid hydrolysis data with philsopher using the pipeline command. Let me know if you would need any further files or information.

prvst commented 2 years ago

Let me add Alexey to this discussion. He might give you a better insight on how to process acid hydrolysis samples.

prvst commented 2 years ago

BTW, did you set the enzyme to nonspecific?

fstein commented 2 years ago

As you could see from the philosopher.yml file, I choose "search_enzyme_name_1: nonspecific" and "num_enzyme_termini: 0". Because we measured the mass of the ms2 spectra in the ion trap, I also set "fragment_mass_tolerance: 0.5".

When I put "nonspecific" as the search_enzyme_name_1, I got this error "Parameter 'search_enzyme_name' was not supplied. Using default value: stricttrypsin". However, I leave "search_enzyme_name: trypsin" and still set "num_enzyme_termini: 0", I don't get this error but still no identified proteins ("FATA[10:36:27] Cannot quantify data set. the PSM list is enpty").

I also tried fragpipe with the worflow "nonspecific-peptidome" and still now proteins were identified. I am grateful for any hints what I might do wrong.

prvst commented 2 years ago

@fcyu, can you look at the nonspecific-peptidome workflow,and see if you spot any issues? If I understood correctly, @fstein is also having issues running this search using FragPipe.

fcyu commented 2 years ago

Hi @fstein , if you have issues using FragPipe for the nonspecific-peptidome workflow, please send us the log file.

Best,

Fengchao

fstein commented 2 years ago

log_2022-03-15_14-00-17.txt

Here is it... Thanks for looking into this.

fcyu commented 2 years ago

Hi @fstein ,

Your log shows that the task finished without any error. But Felipe @prvst said that

If I understood correctly, @fstein is also having issues running this search using FragPipe.

Then, I am confused. Do you have any issue running FragPipe?

Best,

Fengchao

fstein commented 2 years ago

Dear Fengchao,

I don't have any issues running FragPipe or philosopher. It's just, that in this experiment, no proteins were identified. Analyzing this experiment with IsobarQuant or MaxQuant yields tons of PSM's belonging to one protein (it's an acid hydrolysis of a rather clean gel band after purification of a protein). I just don't know why FragPipe or the philosopher pipeline does not yield any of these PSMs. For this experiment, we also measured the MS2 spectrum in the ion trap. Is it maybe because the ms2 mass tolerance was not properly set to 0.5? Any hint is welcome here.

Best,

Frank

anesvi commented 2 years ago

Hi Frank,

We have protein a FDR filter in the pipeline. If you have just one protein, what is 1% protein FDR? In FragPipe, try checking “do not use protein file” (i.e. do not pass protein file to philosopher filter command). Or just do not run ProteinProphet at all.

Alexey

From: fstein @.> Sent: Thursday, March 17, 2022 11:34 AM To: Nesvilab/philosopher @.> Cc: Nesvizhskii, Alexey @.>; Assign @.> Subject: Re: [Nesvilab/philosopher] Help needed. Empty PSM list stops pipeline. (Issue #328)

External Email - Use Caution

Dear Fengchao,

I don't have any issues running FragPipe or philosopher. It's just, that in this experiment, no proteins were identified. Analyzing this experiment with IsobarQuant or MaxQuant yields tons of PSM's belonging to one protein (it's an acid hydrolysis of a rather clean gel band after purification of a protein). I just don't know why FragPipe or the philosopher pipeline does not yield any of these PSMs. For this experiment, we also measured the MS2 spectrum in the ion trap. Is it maybe because the ms2 mass tolerance was not properly set to 0.5? Any hint is welcome here.

Best,

Frank

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/328#issuecomment-1070986755, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM67MOCID2LBEZ2LQ2ZTVANGFZANCNFSM5QMFJ5SA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were assigned.Message ID: @.**@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

fstein commented 2 years ago

Dear Alexey,

thanks a lot for your comment. If I leave out the Protein Inference as you suggested (by setting the Protein Infernence to no in the Steps of the philosopher.yml file), I get the error: FATA[11:25:02] Cannot read file. open interact.prot.xml: The system cannot find the file specified. If I set Protein Inference and FDR Filtering to no, I get the error: Cannot read file:open .meta\psm.bin: The system cannot find the file specified. Also just setting the proteinFDR to 1 in the FDR Filtering tab, does not result in any peptides in the psm.tsv file. So could you maybe be a bit more precise, which parameter in the philosopher.yml file should I set to which value?

When checking the raw_file_name.pin file, I found many PSMs to be identified. How could I check which parameter is responsible for not using any of these PSMs to be reported?

Thanks for your help.

anesvi commented 2 years ago

Hi Felipe,

Could you please help with this. Basically there is only 1 protein, so we do not want to use protein FDR What are the options in philosopher to turn it off, and are they still working?

Thanks Alexey

From: fstein @.> Sent: Thursday, March 24, 2022 7:17 AM To: Nesvilab/philosopher @.> Cc: Nesvizhskii, Alexey @.>; Assign @.> Subject: Re: [Nesvilab/philosopher] Help needed. Empty PSM list stops pipeline. (Issue #328)

External Email - Use Caution

Dear Alexey,

thanks a lot for your comment. If I leave out the Protein Inference as you suggested (by setting the Protein Infernence to no in the Steps of the philosopher.yml file), I get the error: FATA[11:25:02] Cannot read file. open interact.prot.xml: The system cannot find the file specified. If I set Protein Inference and FDR Filtering to no, I get the error: Cannot read file:open .meta\psm.bin: The system cannot find the file specified. Also just setting the proteinFDR to 1 in the FDR Filtering tab, does not result in any peptides in the psm.tsv file. So could you maybe be a bit more precise, which parameter in the philosopher.yml file should I set to which value?

When checking the raw_file_name.pin file, I found many PSMs to be identified. How could I check which parameter is responsible for not using any of these PSMs to be reported?

Thanks for your help.

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/328#issuecomment-1077515064, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6YS2677MSQGDFPEVP3VBRFJFANCNFSM5QMFJ5SA. You are receiving this because you were assigned.Message ID: @.**@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

prvst commented 2 years ago

@fstein this is tricky. Are you working with a FASTA file containing only one protein at all?

fstein commented 2 years ago

No, I am working with a Fasta file containing all proteins of Ecoli and the protein of interest.

prvst commented 2 years ago

I got your files and removed the link for you. I'll take a look.

fstein commented 2 years ago

Thanks a lot. The protein is called P1990_JS.

prvst commented 2 years ago

Hi @fstein, I found your protein in the sample, here's what I did:

  1. Run ProteinProphet to create the protein inference.
  2. Run the filter with the default options: filter --pepxml interact.pep.xml --protxml interact.prot.xml --razor
  3. Run the freequant to get the precursor intensity
  4. Run the report
INFO[15:48:03] Executing Filter  v4.2.1                     
INFO[15:48:03] Processing peptide identification files      
INFO[15:48:03] Parsing interact.pep.xml                     
INFO[15:48:06] 1+ Charge profile                             decoy=0 target=0
INFO[15:48:06] 2+ Charge profile                             decoy=2260 target=3066
INFO[15:48:06] 3+ Charge profile                             decoy=2743 target=3276
INFO[15:48:06] 4+ Charge profile                             decoy=1283 target=1501
INFO[15:48:06] 5+ Charge profile                             decoy=298 target=319
INFO[15:48:06] 6+ Charge profile                             decoy=102 target=107
INFO[15:48:06] Database search results                       ions=14032 peptides=13712 psms=14955
INFO[15:48:06] Converged to 0.99 % FDR with 392 PSMs         decoy=3 threshold=0.9811 total=395
INFO[15:48:06] Converged to 0.73 % FDR with 274 Peptides     decoy=2 threshold=0.9862 total=276
INFO[15:48:06] Converged to 0.96 % FDR with 311 Ions         decoy=2 threshold=0.9833 total=313
INFO[15:48:06] Protein inference results                     decoy=91 target=85
INFO[15:48:06] Converged to 100.00 % FDR with 1 Proteins     decoy=1 threshold=0.9931 total=2
INFO[15:48:06] 2D FDR estimation: Protein mirror image       decoy=1 target=1
INFO[15:48:06] Second filtering results                      ions=1151 peptides=881 psms=1583
INFO[15:48:06] Converged to 0.99 % FDR with 392 PSMs         decoy=3 threshold=0.9811 total=395
INFO[15:48:06] Converged to 0.73 % FDR with 274 Peptides     decoy=2 threshold=0.9862 total=276
INFO[15:48:06] Converged to 0.96 % FDR with 311 Ions         decoy=2 threshold=0.9833 total=313
INFO[15:48:06] Post processing identifications              
INFO[15:48:07] Assigning protein identifications to layers  
INFO[15:48:07] Processing protein inference                 
INFO[15:48:07] Synchronizing PSMs and proteins              
INFO[15:48:07] Total report numbers after FDR filtering, and post-processing  ions=311 peptides=274 proteins=1 psms=392
INFO[15:48:07] Saving                                       
INFO[15:48:07] Done   

image

I suggest you try one more time like I did above. I can send you the tables if you want, let me know.

fstein commented 2 years ago

This is great news. However, for us it is important, that it works with the pipeline command and the philosopher.yml file. It is a little bit confusing for me, why it works for you with the standard parameter. Conclusively, it should also work with the pipeline command, or? I also checked the interact.pep.xml file which is produced running the pipeline command. In this file, I also find all the peptides. But I don't understand, why the output report files stay empty. Here, the output files are only produced if I set the protein fdr to 1. If I leave it at 0.01, then I get the error I reported above. Does the pipeline command works for you? In this case, could you send me your philosopher.yml file?

fstein commented 2 years ago

In the philosopher.yml file, I now set only razor to true and picked, mapMods, models and sequential to false. I got pretty much the same output as you, with the exception of the last line that only zereo ions, peptides, psms and proteins were reported:

INFO[10:04:22] Executing filter on C:\MS_TestRun_gelband INFO[10:04:22] Processing peptide identification files INFO[10:04:24] 1+ Charge profile decoy=0 target=0 INFO[10:04:24] 2+ Charge profile decoy=2260 target=3066 INFO[10:04:24] 3+ Charge profile decoy=2743 target=3276 INFO[10:04:24] 4+ Charge profile decoy=1283 target=1501 INFO[10:04:24] 5+ Charge profile decoy=298 target=319 INFO[10:04:24] 6+ Charge profile decoy=102 target=107 INFO[10:04:24] Database search results ions=14032 peptides=13712 psms=14955 INFO[10:04:24] Converged to 0.99 % FDR with 392 PSMs decoy=3 threshold=0.9811 total=395 INFO[10:04:24] Converged to 0.73 % FDR with 274 Peptides decoy=2 threshold=0.9862 total=276 INFO[10:04:24] Converged to 0.96 % FDR with 311 Ions decoy=2 threshold=0.9833 total=313 INFO[10:04:24] Protein inference results decoy=91 target=85 INFO[10:04:24] Converged to 100.00 % FDR with 1 Proteins decoy=1 threshold=0.9931 total=2 INFO[10:04:24] 2D FDR estimation: Protein mirror image decoy=1 target=1 INFO[10:04:24] Second filtering results ions=1151 peptides=881 psms=1583 INFO[10:04:24] Converged to 0.99 % FDR with 392 PSMs decoy=3 threshold=0.9811 total=395 INFO[10:04:24] Converged to 0.73 % FDR with 274 Peptides decoy=2 threshold=0.9862 total=276 INFO[10:04:24] Converged to 0.96 % FDR with 311 Ions decoy=2 threshold=0.9833 total=313 INFO[10:04:24] Post processing identifications INFO[10:04:24] Assigning protein identifications to layers INFO[10:04:24] Processing protein inference INFO[10:04:24] Synchronizing PSMs and proteins INFO[10:04:24] Total report numbers after FDR filtering, and post-processing ions=0 peptides=0 proteins=0 psms=0 INFO[10:04:24] Saving INFO[10:04:24] Executing report on C:\MS_TestRun_gelband INFO[10:04:24] Creating reports INFO[10:04:24] Done

Any idea what might be the reason?

prvst commented 2 years ago

This might be because of the version you are using. We fixed a bug that would affect the output of the results, pretty ,much like yours. I can send you the current test version for you to try.

fstein commented 2 years ago

I am happy to try the new version and give you feedback...

fstein commented 2 years ago

Dear Filipe,

short feedback. The new version solved the issue with the empty summary files. Thanks a lot.

For an unspecific search there is still one bug remaining. If you set in the philosopher.yml file the following parameter: search_enzyme_name_1: nonspecific

It spits out the following error: "Parameter 'search_enzyme_cutafter' was not supplied. Using default value: KR Parameter 'search_enzyme_butnotafter' was not supplied. Using default value: Parameter 'search_enzyme_name' was not supplied. Using default value: stricttrypsin" So it will do a search with trypsin instead of a nonspecific search.

Apparently "nonspecific" is not a valid search_enzyme name. If you leave "search_enzyme_name_1: stricttrypsin" and set "num_enzyme_termini: 0", it will do a non-specific search nevertheless. And with your new version, I also did not encounter any further issues with this workaround.

But do you have any clue, why it does not allow "nonspecific" as a parameter there? If you choose a nonspecific search in FragPipe, it will set the "search_enzyme_name_1 = nonspecific" in the msfragger params file (in case one exports it). So I was assuming, that I could use this parameter as such also in the philosopher.yml file since most of the parameter names are matching.

Thanks a lot...

fstein commented 2 years ago

PS: Even with this error message mentioned above ("Parameter 'search_enzyme_cutafter' was not supplied. Using default value: KR Parameter 'search_enzyme_butnotafter' was not supplied. Using default value: Parameter 'search_enzyme_name' was not supplied. Using default value: stricttrypsin"), it still works and it identifies all peptides.

prvst commented 2 years ago

@fstein we are replacing the philosopher pipeline by the FragPipe CMD option, that is why we are putting more effort in adding new functionalities to FragPipe, than the philosopher pipeline. I think we can send you the pre-release version for testing, and all this issues you're having will not be a problem.