Nesvilab / philosopher

PeptideProphet, PTMProphet, ProteinProphet, iProphet, Abacus, and FDR filtering
https://philosopher.nesvilab.org
GNU General Public License v3.0
109 stars 17 forks source link

philosopher pipeline gives all zeros in TMT reporter intensities #174

Closed rajanil closed 3 years ago

rajanil commented 3 years ago

Hello, I've been trying to apply the Philosopher pipeline to a recently published CPTAC proteomics dataset. The workflow appears to run successfully, but the reporter ion intensities in the "peptide.tsv" generated by the report are almost all zeros.

Here's a google drive folder with all the data files, output files, the .meta folder, a log file, and the required yaml config file describing the workflow. I would greatly appreciate any suggestions on which parameters I've gotten wrong. (Except for those parameters specified in the manuscript, I've used default values.)

thanks! -Anil

prvst commented 3 years ago

@rajanil, I don't have permission to access your drive folder. I can send you a file request via Box if you prefer.

rajanil commented 3 years ago

@prvst I've just added access to felipevl@umich.edu. Let me know if you prefer another email. Or Box would work too for me.

prvst commented 3 years ago

@rajanil, This particular data set was quantified using MS3 spectra. Open your parameter file, and scroll down to the Isobaric Quantification section. Change the level parameter to level: 3.

Feel free to reopen the issue if the problems persist.

rajanil commented 3 years ago

@prvst, thanks for this pointer! I re-ran the analysis after changing the parameter to level: 3, and get the same problem. I've updated the drive folder with my latest run output and yml file. (I'm not sure how to re-open this issue though.)

prvst commented 3 years ago

OK, try one more time with the following changes:

precursor_mass_lower: -20                   
precursor_mass_upper: 20                     
precursor_mass_units: 1                     
precursor_true_tolerance: 20                 
precursor_true_units: 1                      
fragment_mass_tolerance: 0.6                 
fragment_mass_units: 0                       
calibrate_mass: 0                            
deisotope: 1                                 
isotope_error: -1/0/1/2/3
variable_mod_01: 15.99490 M 3 
variable_mod_02: 42.01060 [^ 1  
variable_mod_03: 229.162932 n^ 1
variable_mod_04: 229.162932 S 1  

clear_mz_range: 125.5 131.5

FDR Filtering:                                   
  psmFDR: 0.01                                   
  peptideFDR: 0.01                              
  ionFDR: 0.01                                   
  proteinFDR: 0.01                               
  peptideProbability: 0.7                        
  proteinProbability: 0.5                        
  peptideWeight: 1                              
  razor: true                                   
  picked: true                                   
  mapMods: true                                  
  models: true                                  
  sequential: true 
rajanil commented 3 years ago

Unfortunately, these changes don't seem to have resolved the issue. Just to clarify, did these changes work for you with this dataset? (Just making sure I'm not executing the software incorrectly.) I have updated the google drive with my latest run. Is there a yaml file and database fasta used for the publication, that I should use to help with debugging?

prvst commented 3 years ago

Could you explain to me how did you convert your raw files? Can you also paste here the command that you are executing?

rajanil commented 3 years ago

I downloaded the mzML files from the CPTAC data portal (https://cptac-data-portal.georgetown.edu/datasets). I'm not sure how they were generated from the raw files.

anesvi commented 3 years ago

Also, you can try running in FragPipe GUI too. There is a default workflow TMT10-MS3. You can change to TMT11 by changing the corresponding setting on TMT-Integrator page. Also, change Define reference to ‘Reference sample’ and specify how the reference channel is called (‘Ref sample tag’ parameter). Check annotation.tsv files. It is probably called ‘bridge’

If you running using command line, you should get the same, but it is a bit harder to see what parameters may be wrong in your yaml file. Send me your yaml file via email. I can take a look.

Alexey

From: Anil Raj notifications@github.com Sent: Friday, January 8, 2021 9:47 AM To: Nesvilab/philosopher philosopher@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [Nesvilab/philosopher] philosopher pipeline gives all zeros in TMT reporter intensities (#174)

External Email - Use Caution

I downloaded the mzML files from the CPTAC data portal (https://cptac-data-portal.georgetown.edu/datasets). I'm not sure how they were generated from the raw files.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/174#issuecomment-756792472, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6YKRQVATPJ463M544LSY4LENANCNFSM4VLTJXGQ.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

vinayakvsv commented 3 years ago

Hello all,

I do not know if this issue is effectively closed, but I would be interested in finding out what changes to the YAML file were necessary to fix the problem @rajanil was having. I also ran Philosopher in pipeline mode on the data from the paper linked above and replicated the modifications to the philosopher.yml file. I am also reporting all-0 intensities for the peptide.tsv and protein.tsv values.

Thanks. --Vinay

prvst commented 3 years ago

@rajanil, This particular data set was quantified using MS3 spectra. Open your parameter file, and scroll down to the Isobaric Quantification section. Change the level parameter to level: 3.

Feel free to reopen the issue if the problems persist.

See this reply here

vinayakvsv commented 3 years ago

The problem still persisted; I had level: 3 in my original yml file.

prvst commented 3 years ago

Do you have MS3 scans in your data?

vinayakvsv commented 3 years ago

Yes -- this is the CPTAC Pediatric Brain Cancer dataset (I have been able to get non-zero intensities and otherwise proper output when replicating the steps for MS2-level data like CPTAC CCRCC).

rajanil commented 3 years ago

@vinayakvsv The underlying problem was an incompatibility between the published CPTAC mzML files and the version of Philosopher I was using.

$ ./philosopher version INFO[11:40:42] Current Philosopher build and version build=1604525264 version=v3.3.12

Alexey suggested reconverting the raw files to mzML using the latest version of msconvert, and running Philosopher on these. This resolved the issue for me.

vinayakvsv commented 3 years ago

Okay. I will do this and report back if this fixes the issue for me. Thanks, all!

anesvi commented 3 years ago

Yes Felipe, we need to resolve this somehow. If we cannot support the mzML version that was released by CPTAC (did you follow up with them?), Philosopher can print an informative error asking to reconvert. Hopefully we can have t in the next release

From: Anil Raj @.> Sent: Tuesday, June 1, 2021 12:09 PM To: Nesvilab/philosopher @.> Cc: Nesvizhskii, Alexey @.>; Comment @.> Subject: Re: [Nesvilab/philosopher] philosopher pipeline gives all zeros in TMT reporter intensities (#174)

External Email - Use Caution

@vinayakvsvhttps://github.com/vinayakvsv The underlying problem was an incompatibility between the published CPTAC mzML files and the version of Philosopher I was using.

$ ./philosopher version INFO[11:40:42] Current Philosopher build and version build=1604525264 version=v3.3.12

Alexey suggested reconverting the raw files to mzML using the latest version of msconvert, and running Philosopher on these. This resolved the issue for me.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/174#issuecomment-852249261, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6ZPHWCD66XXQ5OQUNLTQUA3PANCNFSM4VLTJXGQ.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

prvst commented 3 years ago

Yes. It seems that we need to start reporting the version, and stop if it is an old one.

vinayakvsv commented 3 years ago

Hello @prvst and @anesvi ,

Redownloading and re-converting the .raw files to indexed mzML files appears to have worked; from just 01CBTTC_PBT_Proteome_HMS_20180626_mzML, I get about 4699 proteins in the TMT-integrator results. One slight modification I made to your recommendation was to run ThermoRawFileParser using mono (https://github.com/compomics/ThermoRawFileParser), since the computing cluster that I use currently does not permit using a Docker image to run msconvert as provided in the Proteowizard documentation. Please let me know if the approach I took and the basic result I got is reasonable.

Thank you for your help, and thank you for a very tractable pipeline. --Vinay

anesvi commented 3 years ago

Felipe can you check the numbers, and make sure they have our parameters. Thanks, Alexey

From: vinayakvsv @.> Sent: Tuesday, June 1, 2021 11:46 PM To: Nesvilab/philosopher @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/philosopher] philosopher pipeline gives all zeros in TMT reporter intensities (#174)

External Email - Use Caution

Hello @prvsthttps://github.com/prvst and @anesvihttps://github.com/anesvi ,

Redownloading and re-converting the .raw files to indexed mzML files appears to have worked; from just 01CBTTC_PBT_Proteome_HMS_20180626_mzML, I get about 4699 proteins in the TMT-integrator results. One slight modification I made to your recommendation was to run ThermoRawFileParser using mono (https://github.com/compomics/ThermoRawFileParser), since the computing cluster that I use currently does not permit using a Docker image to run msconvert as provided in the Proteowizard documentation. Please let me know if the approach I took and the basic result I got is reasonable.

Thank you for your help, and thank you for a very tractable pipeline. --Vinay

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/174#issuecomment-852694283, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6ZZGKXNV67XK5CADFDTQWSPNANCNFSM4VLTJXGQ.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

prvst commented 3 years ago

vinayaksv, when you say 4699 proteins, you mean from the entire cohort, or from a few selected data sets? In my most recent run. I got more than 9k proteins.

vinayakvsv commented 3 years ago

Just from 01CBTTC_PBT_Proteome_HMS_20180626_mzML.