Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
199 stars 38 forks source link

sample intensities reported in protein.tsv after FragPipe v18 TMT workflow execution #750

Closed tobiasko closed 2 years ago

tobiasko commented 2 years ago

I am trying to analyse 2D-TPP data (5x TMT10plex in fractions) using the bioconductor package TPP2D. The package expects the input data (protein level abundance estimates) as tabular data per TMTnplex sub experiment presented as list. I generate such a data structure by reading in the protein.tsv files found in the experiment folders (here named: T1_2, T3_4, ...). Here is a listing of the FragPipe output folder:

tobiasko@fgcz-c-073:/scratch/tobiasko/o28644/output$ ls -l
total 176856
-rw-rw-rw-+ 1 tobiasko SG_Employees 148147113 Jun 27 10:49 combined.prot.xml
-rw-rw-rw-+ 1 tobiasko SG_Employees      5679 Jun 27 10:31 filelist_proteinprophet.txt
-rw-rw-rw-+ 1 tobiasko SG_Employees      9381 Jun 27 10:31 fragger.params
-rw-rw-rw-+ 1 tobiasko SG_Employees      5150 Jun 27 11:39 lcms-files_2022-06-27_11-39-36.fp-manifest
-rw-rw-rw-+ 1 tobiasko SG_Employees      5150 Jun 27 13:42 lcms-files_2022-06-27_13-42-08.fp-manifest
-rw-rw-rw-+ 1 tobiasko SG_Employees      5150 Jun 27 13:43 lcms-files_2022-06-27_13-43-52.fp-manifest
-rw-rw-rw-+ 1 tobiasko SG_Employees      5150 Jun 27 14:34 lcms-files_2022-06-27_14-34-00.fp-manifest
-rw-rw-rw-+ 1 tobiasko SG_Employees    640420 Jun 27 11:39 log_2022-06-27_11-39-36.txt
-rw-rw-rw-+ 1 tobiasko SG_Employees     21162 Jun 27 13:42 log_2022-06-27_13-42-08.txt
-rw-rw-rw-+ 1 tobiasko SG_Employees     23136 Jun 27 13:43 log_2022-06-27_13-43-52.txt
-rw-rw-rw-+ 1 tobiasko SG_Employees     85285 Jun 27 14:34 log_2022-06-27_14-34-00.txt
-rw-rw-rw-+ 1 tobiasko SG_Employees   8196768 Jul  3 19:47 modelParams.Rdata
-rw-rw-rw-+ 1 tobiasko SG_Employees   6245895 Jul  4 10:28 null_model_B20.Rdata
-rw-rw-rw-+ 1 tobiasko SG_Employees  17281328 Jul  1 14:32 preproc_df.RData
-rw-rw-rw-+ 1 tobiasko SG_Employees    384629 Jul  4 10:26 Rplots.pdf
drwxrwxrwx+ 1 tobiasko SG_Employees      3650 Jun 27 14:33 T1_2
drwxrwxrwx+ 1 tobiasko SG_Employees      3650 Jun 27 14:33 T3_4
drwxrwxrwx+ 1 tobiasko SG_Employees      3650 Jun 27 14:33 T5_6
drwxrwxrwx+ 1 tobiasko SG_Employees      3650 Jun 27 14:33 T7_8
drwxrwxrwx+ 1 tobiasko SG_Employees      3724 Jun 27 14:33 T9_10
-rw-rw-rw-+ 1 tobiasko SG_Employees       756 Jun 27 13:44 tmt-integrator-conf.yml
drwxrwxrwx+ 1 tobiasko SG_Employees        96 Jun 27 14:35 tmt-report

My protein.tsv file look like this:

tobiasko@fgcz-c-073:/scratch/tobiasko/o28644/output/T1_2$ head protein.tsv 
Protein Protein ID  Entry Name  Gene    Length  Organism    Protein Description Protein Existence   Protein Probability Top Peptide Probability Total Peptides  Unique Peptides Razor Peptides  Total Spectral Count    Unique Spectral Count   Razor Spectral Count    Total Intensity Unique Intensity    Razor Intensity Razor Assigned Modifications    Razor Observed ModificationIndistinguishable Proteins   A1  B1  C1  D1  E1  A3  B3  C3  D3  E3
Biognosys|iRT-Kit_WR_fusion Biognosys|iRT-Kit_WR_fusion GN=iRTKit   Biognosys|iRT-Kit_WR_fusion GN=iRTKit   iRTKit  134     GN=iRTKit       1.0000  0.9990  6   6   6   98  98  98  586561752   586561752   586561752               9192440.0218    10226911.0808   18828271.4493   18123307.8367   11580617.9923   28214954.3546   29158890.6543   15442467.9109   11555365.3075   11724497.8057
...

My question: Are the protein abundance estimates reported in protein.tsv (columns A1, B1, C1, ...) the raw reporter ion intensities? Or are they already transformed in some way? The TMTintegrator settings only affect the output written to the tmt-report folder?

Your webpage says additional columns for TMT/iTRAQ channels if applicable, each contains relative reporter ion abundances

Really relative, not abs.? Relative to what?

logs are attached log_2022-06-27_14-34-00.txt

prvst commented 2 years ago

The abundances from Philosopher output files are all "raw". The idea is to provide people with the raw values so they can followup with their preferred post-processing tools and methods. You can use TMT-Integrator to filter, clean, and normalize your abundances, in a way that you can easily use them to make any assumptions. Our go-to method is based on TMT-Integrator, which uses reference channels to normalize the abundances. It's a common practice to use pooled channels when you have multiple experiments in different plexes, like you do. If a pooled (a.k.a. bridge) channel is missing, TMT-I can also generate what we call a virtual channel to use as a reference and calculate the ratios. As an option, the program can also revert the ratio calculation and print straight abundances.

tobiasko commented 2 years ago

Great! That helps a lot. THX!