Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
184 stars 37 forks source link

msstatsTMT #510

Open KlemensFroehlich opened 2 years ago

KlemensFroehlich commented 2 years ago

Question: For label-free data, fragpipe offers to export MSSTATS compatible output ( which is really awesome ). Do you think you could also support this for MSstatsTMT ? The input file format is different and requires a lot more info than currently can be specified in fragpipe. So while you can currently set VALIDATION -> GENERATE MSSTATS FILES to TRUE while doing a TMT analysis, it does not generate an msstats output that is compatible with msstats(TMT). Alternatively it would be nice to see in fragpipe that the msstats output can only be generated for non-TMT data.

and off topic: TMT18 plex support would be awesome!

Best Klemens

fcyu commented 2 years ago

We will support MSstatsTMT in the future. Stay tuned!

BTW, for the LFQ data, we recommend using IonQuant in the MS1 Quant tab and not enabling generate msstats files in the validation tab. IonQuant will always generate a MSstats.tsv with LFQ intensities from all experiments.

Best,

Fengchao

KlemensFroehlich commented 2 years ago

Thanks Fengchao for the answer. Looking forward to using fragpipe for everything, including TMT in the future :)

Best, Klemens

tobiasko commented 1 year ago

Dear FragPipe team,

I was wondering what is the current status of the MSstatsTMT support? I instructed FragPipe 19.0 to generate msstats.csv files in the context of a TMT10plex workflow. But I am struggling to identify the intended way of importing the data into MSstatsTMT. My naive guess was that

MSstatsTMT::PhilosophertoMSstatsTMTFormat(path = paste0(path,"TMT10plex_T1_2"), folder = TRUE, annotation = paste0(path,"/combined_annotation.tsv")
+ )
INFO  [2023-03-14 17:04:24] ** Raw data from Philosopher imported successfully.
Error in annotation[["Channel"]] : subscript out of bounds
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lattice_0.20-45     BiocParallel_1.28.3 TPP2D_1.10.0        dplyr_1.0.8         readr_2.1.2        

loaded via a namespace (and not attached):
 [1] tidyr_1.2.0           bit64_4.0.5           vroom_1.5.7           splines_4.1.2         foreach_1.5.2         gtools_3.9.2          assertthat_0.2.1      yaml_2.3.5           
 [9] ggrepel_0.9.1         numDeriv_2016.8-1.1   backports_1.4.1       pillar_1.7.0          glue_1.6.2            limma_3.50.1          digest_0.6.29         checkmate_2.0.0      
[17] minqa_1.2.4           colorspace_2.0-3      preprocessCore_1.56.0 htmltools_0.5.2       Matrix_1.4-1          pkgconfig_2.0.3       MSstatsTMT_2.2.7      purrr_0.3.4          
[25] scales_1.1.1          openxlsx_4.2.5        tzdb_0.3.0            lme4_1.1-28           tibble_3.1.6          generics_0.1.2        farver_2.1.0          ggplot2_3.3.5        
[33] ellipsis_0.3.2        withr_2.5.0           cli_3.2.0             survival_3.3-1        magrittr_2.0.3        crayon_1.5.1          evaluate_0.15         fansi_1.0.3          
[41] doParallel_1.0.17     nlme_3.1-157          MASS_7.3-56           log4r_0.4.2           gplots_3.1.1          tools_4.1.2           data.table_1.14.2     hms_1.1.1            
[49] lifecycle_1.0.1       stringr_1.4.0         munsell_0.5.0         zip_2.2.0             MSstats_4.2.0         compiler_4.1.2        caTools_1.18.2        rlang_1.0.2          
[57] grid_4.1.2            RCurl_1.98-1.6        nloptr_2.0.0          iterators_1.0.14      rstudioapi_0.13       marray_1.72.0         bitops_1.0-7          labeling_0.4.2       
[65] rmarkdown_2.13        boot_1.3-28           lmerTest_3.1-3        gtable_0.3.0          codetools_0.2-18      DBI_1.1.2             R6_2.5.1              knitr_1.38           
[73] fastmap_1.1.0         bit_4.0.4             utf8_1.2.2            MSstatsConvert_1.4.1  KernSmooth_2.23-20    stringi_1.7.6         parallel_4.1.2        Rcpp_1.0.8.3         
[81] vctrs_0.4.0           tidyselect_1.1.2      xfun_0.30            

would be the right way. But it seems like the annotation file is not structured in the expected way. I also noticed that the combined_annotation.tsv has been renamed in the latest FragPipe release. Is this file intended for MSstats import at all? Or does one need to construct the annotation file manually according to the MSstatsTMT package vignette (but PhilosophertoMSstatsTMTFormat() is the right import function to use)?

> library(readr)
> msstats_T1_2 <- read_csv("~/Downloads/WU286728/TMT10plex_T1_2/msstats.csv")
Rows: 170933 Columns: 23                                                                                                                                                                                                
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): Spectrum.Name, Spectrum.File, Peptide.Sequence, Modified.Peptide.Sequence, Gene, Protein.Accessions
dbl (15): Charge, Calculated.MZ, PeptideProphet.Probability, Intensity, Purity, Channel 126, Channel 127N, Channel 127C, Channel 128N, Channel 128C, Channel 129N, Channel 129C, ...
lgl  (2): Is.Unique, Modifications

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> View(msstats_T1_2)
> msstats_T1_2
# A tibble: 170,933 × 23
   Spectrum.Name  Spectrum.File Peptide.Sequence Modified.Peptid… Charge Calculated.MZ PeptideProphet.… Intensity Is.Unique Gene  Protein.Accessi… Modifications Purity `Channel 126`
   <chr>          <chr>         <chr>            <chr>             <dbl>         <dbl>            <dbl>     <dbl> <lgl>     <chr> <chr>            <lgl>          <dbl>         <dbl>
 1 20230220_002_… 20230220_002… SHHEDRAGHGHSADS… n[230]SHHEDRAGH…      3          691.            0.822    20767. FALSE     FLG   sp|P20930|FILA_… NA              1               0 
 2 20230220_002_… 20230220_002… RRVEHHDHAVVSGR   NA                    4          414.            0.806   100653. FALSE     AIFM1 sp|O95831|AIFM1… NA              0.84            0 
 3 20230220_002_… 20230220_002… RVEHHDHAVVSGR    NA                    4          375.            0.999  6101583  FALSE     AIFM1 sp|O95831|AIFM1… NA              0.95            0 
 4 20230220_002_… 20230220_002… HGSGLGHSSSHGQHG… n[230]HGSGLGHSS…      5          424.            1       412864. FALSE     HRNR  sp|Q86YZ3|HORN_… NA              0.98            0 
 5 20230220_002_… 20230220_002… HEECSRPHNGR      n[230]HEECSRPHN…      4          403.            0.913   491961. TRUE      THOC6 sp|Q86W42|THOC6… NA              0.95          932.
 6 20230220_002_… 20230220_002… HGGEDGRNNSGAPHR  n[230]HGGEDGRNN…      4          448.            0.777   438030. FALSE     ACBD5 tr|A0A7I2V2Y9|A… NA              0.88            0 
 7 20230220_002_… 20230220_002… NTPSQHSHSIQHSPER NA                    3          615.            1       785370. FALSE     BCLA… sp|Q9NYF8|BCLF1… NA              0.73            0 
 8 20230220_002_… 20230220_002… NTPSQHSHSIQHSPER NA                    4          461.            1.00   2382806  FALSE     BCLA… sp|Q9NYF8|BCLF1… NA              0.69            0 
 9 20230220_002_… 20230220_002… SHHKDHSDSESTSSD… n[230]SHHKDHSDS…      5          506.            0.998    96035. FALSE     KDM6A sp|O15550|KDM6A… NA              0.83            0 
10 20230220_002_… 20230220_002… GNCNRGENDCR      n[230]GNCNRGEND…      3          528.            1       597927. FALSE     MBNL1 sp|Q9NR56|MBNL1… NA              0.86         6457.
# … with 170,923 more rows, and 9 more variables: `Channel 127N` <dbl>, `Channel 127C` <dbl>, `Channel 128N` <dbl>, `Channel 128C` <dbl>, `Channel 129N` <dbl>,
#   `Channel 129C` <dbl>, `Channel 130N` <dbl>, `Channel 130C` <dbl>, `Channel 131N` <dbl>
> 

Thanks a lot for your help, Tobi

fcyu commented 1 year ago

Hi Tobi,

We have a version that support MSstatsTMT better. We have a tutorial about it: https://docs.google.com/document/d/1TqO9WDI3k_1FTOI1dQYV4D4nf7C9TX7Xl9AzHxYNe84/edit

But it seems like the annotation file is not structured in the expected way. I also noticed that the combined_annotation.tsv has been renamed in the latest FragPipe release. Is this file intended for MSstats import at all? Or does one need to construct the annotation file manually according to the MSstatsTMT package vignette (but PhilosophertoMSstatsTMTFormat() is the right import function to use)?

The combined_annotation.tsv is for FragPipe-Analyst. The pre-released version has another annotation file for it.

Could you please take a look and try the pre-released version?

Thanks,

Fengchao

tobiasko commented 1 year ago

Hi @fcyu

thanks for sharing the tutorial. I had a look and generated a corresponding MSstatsTMT_annotation.csv for my local data. It looks like:

> Dataset_44878_item_
# A tibble: 10 × 7
   Run                                           Fraction TechRepMixture Mixture Channel BioReplicate Condition 
   <chr>                                            <dbl>          <dbl>   <dbl> <chr>   <chr>        <chr>     
 1 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw        1              1       1 126     S449267_126  37_5      
 2 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw        1              1       1 127N    S449267_127N 37_1      
 3 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw        1              1       1 127C    S449267_127C 37_0.134  
 4 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw        1              1       1 128N    S449267_128N 37_0.02   
 5 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw        1              1       1 128C    S449267_128C 37_0      
 6 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw        1              1       1 129N    S449267_129N 39.3_5    
 7 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw        1              1       1 129C    S449267_129C 39.3_1    
 8 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw        1              1       1 130N    S449267_130N 39.3_0.134
 9 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw        1              1       1 130C    S449267_130C 39.3_0.02 
10 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw        1              1       1 131N    S449267_131N 39.3_0    

But when I execute the import function I get:

> test <- MSstatsTMT::PhilosophertoMSstatsTMTFormat(path = paste0(path,"TMT10plex_T1_2"), folder = TRUE, annotation = Dataset_44878_item_)
INFO  [2023-03-15 08:51:02] ** Raw data from Philosopher imported successfully.
INFO  [2023-03-15 08:51:03] ** Using provided annotation.
INFO  [2023-03-15 08:51:03] ** Run and Channel labels were standardized to remove symbols such as '.' or '%'.
INFO  [2023-03-15 08:51:03] ** The following options are used:
  - Features will be defined by the columns: PeptideSequence, PrecursorCharge
  - Shared peptides will be removed.
  - Proteins with single feature will not be removed.
  - Features with less than 3 measurements within each run will be removed.
INFO  [2023-03-15 08:51:03] ** Rows with values not greater than 0.6 in Purity are removed 
INFO  [2023-03-15 08:51:03] ** Rows with values not greater than 0.7 in PeptideProphetProbability are removed 
INFO  [2023-03-15 08:51:03] ** Sequences containing Oxidation are removed.
INFO  [2023-03-15 08:51:03] ** Features with all missing measurements across channels within each run are removed.
INFO  [2023-03-15 08:51:04] ** Shared peptides are removed.
INFO  [2023-03-15 08:51:04] ** Features with one or two measurements across channels within each run are removed.
INFO  [2023-03-15 08:51:17] ** PSMs have been aggregated to peptide ions.
INFO  [2023-03-15 08:51:18] ** Run annotation merged with quantification data.
WARN  [2023-03-15 08:51:18] ** Condition in the input file must match condition in annotation.
INFO  [2023-03-15 08:51:19] ** Features with one or two measurements across channels within each run are removed.
INFO  [2023-03-15 08:51:19] ** Fractionation handled.
INFO  [2023-03-15 08:51:20] ** Updated quantification data to make balanced design. Missing values are marked by NA
INFO  [2023-03-15 08:51:20] ** Finished preprocessing. The dataset is ready to be processed by the proteinSummarization function.

> head(test)
           ProteinName           PeptideSequence Charge                         PSM Mixture TechRepMixture                                        Run Channel BioReplicate Condition
1 sp|Q9Y4H2|IRS2_HUMAN AAAAAAAAVPSAGPAGPAPTSAAGR      3 AAAAAAAAVPSAGPAGPAPTSAAGR_3    <NA>           <NA> 20230220_012_S449277_TMT10plex_T1_2_11_rep     126         <NA>      <NA>
2 sp|Q9Y4H2|IRS2_HUMAN AAAAAAAAVPSAGPAGPAPTSAAGR      3 AAAAAAAAVPSAGPAGPAPTSAAGR_3    <NA>           <NA> 20230220_012_S449277_TMT10plex_T1_2_11_rep    127C         <NA>      <NA>
3 sp|Q9Y4H2|IRS2_HUMAN AAAAAAAAVPSAGPAGPAPTSAAGR      3 AAAAAAAAVPSAGPAGPAPTSAAGR_3    <NA>           <NA> 20230220_012_S449277_TMT10plex_T1_2_11_rep    127N         <NA>      <NA>
4 sp|Q9Y4H2|IRS2_HUMAN AAAAAAAAVPSAGPAGPAPTSAAGR      3 AAAAAAAAVPSAGPAGPAPTSAAGR_3    <NA>           <NA> 20230220_012_S449277_TMT10plex_T1_2_11_rep    128C         <NA>      <NA>
5 sp|Q9Y4H2|IRS2_HUMAN AAAAAAAAVPSAGPAGPAPTSAAGR      3 AAAAAAAAVPSAGPAGPAPTSAAGR_3    <NA>           <NA> 20230220_012_S449277_TMT10plex_T1_2_11_rep    128N         <NA>      <NA>
6 sp|Q9Y4H2|IRS2_HUMAN AAAAAAAAVPSAGPAGPAPTSAAGR      3 AAAAAAAAVPSAGPAGPAPTSAAGR_3    <NA>           <NA> 20230220_012_S449277_TMT10plex_T1_2_11_rep    129C         <NA>      <NA>
  Intensity
1  33370.71
2  36490.16
3  31099.41
4  34305.99
5  36577.45
6  32279.51

It warns that Condition in the input file must match condition in annotation. and only puts missing values. But I can see any condition in the input data (the MSstats.csv file).

Does the pre-release version change the content of MSstats.csv? I can't easily change to a pre-release, since the data was processed by our scripted production pipeline at the core facility.

Best, Tobi

anesvi commented 1 year ago

Hi Tobi, We generated the file by working closely with Devon from Olga Vitek lab (MSStats) He tested it extensively Can you email me directly so I can forward your email to him? Thanks Alexey

From: Tobias Kockmann @.> Sent: Wednesday, March 15, 2023 10:10 AM To: Nesvilab/FragPipe @.> Cc: Subscribed @.***> Subject: Re: [Nesvilab/FragPipe] msstatsTMT (Issue #510)

External Email - Use Caution

Hi @fcyuhttps://github.com/fcyu

thanks for sharing the tutorial. I had a look and generated a corresponding MSstatsTMT_annotation.csv for my local data. It looks like:

Dataset_44878item

A tibble: 10 × 7

Run Fraction TechRepMixture Mixture Channel BioReplicate Condition

1 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw 1 1 1 126 S449267_126 37_5 2 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw 1 1 1 127N S449267_127N 37_1 3 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw 1 1 1 127C S449267_127C 37_0.134 4 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw 1 1 1 128N S449267_128N 37_0.02 5 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw 1 1 1 128C S449267_128C 37_0 6 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw 1 1 1 129N S449267_129N 39.3_5 7 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw 1 1 1 129C S449267_129C 39.3_1 8 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw 1 1 1 130N S449267_130N 39.3_0.134 9 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw 1 1 1 130C S449267_130C 39.3_0.02 10 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw 1 1 1 131N S449267_131N 39.3_0 But when I execute the import function I get: > test <- MSstatsTMT::PhilosophertoMSstatsTMTFormat(path = paste0(path,"TMT10plex_T1_2"), folder = TRUE, annotation = Dataset_44878_item_) INFO [2023-03-15 08:51:02] ** Raw data from Philosopher imported successfully. INFO [2023-03-15 08:51:03] ** Using provided annotation. INFO [2023-03-15 08:51:03] ** Run and Channel labels were standardized to remove symbols such as '.' or '%'. INFO [2023-03-15 08:51:03] ** The following options are used: - Features will be defined by the columns: PeptideSequence, PrecursorCharge - Shared peptides will be removed. - Proteins with single feature will not be removed. - Features with less than 3 measurements within each run will be removed. INFO [2023-03-15 08:51:03] ** Rows with values not greater than 0.6 in Purity are removed INFO [2023-03-15 08:51:03] ** Rows with values not greater than 0.7 in PeptideProphetProbability are removed INFO [2023-03-15 08:51:03] ** Sequences containing Oxidation are removed. INFO [2023-03-15 08:51:03] ** Features with all missing measurements across channels within each run are removed. INFO [2023-03-15 08:51:04] ** Shared peptides are removed. INFO [2023-03-15 08:51:04] ** Features with one or two measurements across channels within each run are removed. INFO [2023-03-15 08:51:17] ** PSMs have been aggregated to peptide ions. INFO [2023-03-15 08:51:18] ** Run annotation merged with quantification data. WARN [2023-03-15 08:51:18] ** Condition in the input file must match condition in annotation. INFO [2023-03-15 08:51:19] ** Features with one or two measurements across channels within each run are removed. INFO [2023-03-15 08:51:19] ** Fractionation handled. INFO [2023-03-15 08:51:20] ** Updated quantification data to make balanced design. Missing values are marked by NA INFO [2023-03-15 08:51:20] ** Finished preprocessing. The dataset is ready to be processed by the proteinSummarization function. > head(test) ProteinName PeptideSequence Charge PSM Mixture TechRepMixture Run Channel BioReplicate Condition 1 sp|Q9Y4H2|IRS2_HUMAN AAAAAAAAVPSAGPAGPAPTSAAGR 3 AAAAAAAAVPSAGPAGPAPTSAAGR_3 20230220_012_S449277_TMT10plex_T1_2_11_rep 126 2 sp|Q9Y4H2|IRS2_HUMAN AAAAAAAAVPSAGPAGPAPTSAAGR 3 AAAAAAAAVPSAGPAGPAPTSAAGR_3 20230220_012_S449277_TMT10plex_T1_2_11_rep 127C 3 sp|Q9Y4H2|IRS2_HUMAN AAAAAAAAVPSAGPAGPAPTSAAGR 3 AAAAAAAAVPSAGPAGPAPTSAAGR_3 20230220_012_S449277_TMT10plex_T1_2_11_rep 127N 4 sp|Q9Y4H2|IRS2_HUMAN AAAAAAAAVPSAGPAGPAPTSAAGR 3 AAAAAAAAVPSAGPAGPAPTSAAGR_3 20230220_012_S449277_TMT10plex_T1_2_11_rep 128C 5 sp|Q9Y4H2|IRS2_HUMAN AAAAAAAAVPSAGPAGPAPTSAAGR 3 AAAAAAAAVPSAGPAGPAPTSAAGR_3 20230220_012_S449277_TMT10plex_T1_2_11_rep 128N 6 sp|Q9Y4H2|IRS2_HUMAN AAAAAAAAVPSAGPAGPAPTSAAGR 3 AAAAAAAAVPSAGPAGPAPTSAAGR_3 20230220_012_S449277_TMT10plex_T1_2_11_rep 129C Intensity 1 33370.71 2 36490.16 3 31099.41 4 34305.99 5 36577.45 6 32279.51 It warns that Condition in the input file must match condition in annotation. and only puts missing values. But I can see any condition in the input data (the MSstats.csv file). Does the pre-release version change the content of MSstats.csv? I can easily change, since the data was processed by our scripted production pipeline at the core facility. Best, Tobi — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.******@***.***>> ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
fcyu commented 1 year ago

Hi Tobi,

Does the pre-release version change the content of MSstats.csv? I can't easily change to a pre-release, since the data was processed by our scripted production pipeline at the core facility.

I believe we changed something in the Philosopher but I can't remember what they were since there is no changelog for the RC versions. But I suggest you try to re-process some of your data using the pre-released versions to make sure that they work for you.

Best,

Fengchao

tobiasko commented 1 year ago

Thanks for the kind offer @anesvi ! I send the Email.

anesvi commented 1 year ago

Hi Tobi, yes you have to use the pre-release version. We changed the msstats files that philosopher writes so they are compatible with msstatsTMT. The publicly released version is not compatible.

tobiasko commented 1 year ago

I fear I can't use the pre-release at the moment. Than this issue needs to wait till we have an official release.

clairesimpson95 commented 1 year ago

I see this exact issue when I keep the extension (.raw, .mzml) in the Run column of the annotation table (NAs in many columns), and when I remove the extension (in your case, it would be 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw to 20230220_002_S449267_TMT10plex_T1_2_1_rep), the Condition, Mixture, BioReplicate, and TechRepMixture columns go from NA to the actual names. There may also be other reasons to wait until the new release, but try changing the Run column and see if that fixes this issue.

tobiasko commented 1 year ago

@clairesimpson95 This depends on the content of your input data table. In my case Spectrum File also includes the extension (.raw). They just need to match.

41ison commented 10 months ago

I see this exact issue when I keep the extension (.raw, .mzml) in the Run column of the annotation table (NAs in many columns), and when I remove the extension (in your case, it would be 20230220_002_S449267_TMT10plex_T1_2_1_rep.raw to 20230220_002_S449267_TMT10plex_T1_2_1_rep), the Condition, Mixture, BioReplicate, and TechRepMixture columns go from NA to the actual names. There may also be other reasons to wait until the new release, but try changing the Run column and see if that fixes this issue.

Yep, the way the MSstatsTMT function is checking the match is a bit weird. I just imported my data as follows and it worked fine.

msstats_df <- read_delim("msstats.csv", delim = ",", escape_double = FALSE, trim_ws = TRUE) %>% mutate(Spectrum.File = str_remove(Spectrum.File, ".mzML"))

So I would say, as long as you remove the .raw or .mzML in FragPipe's msstats.csv output rather than in the annotation file, it should work.

luizalmeida93 commented 7 months ago

Hi, I am trying to use Fragpipe for TMT 6-plex and can't generate the files compatible with MSstats. I was wondering if I am doing anything wrong or if Fragpipe/Philosopher has some sort of problem with TMT 6-plex (as I've seen in other posts).

In more details: I am running Fragpipe v21, MSfragger v4.0, IonQuant v1.10.12, and Philosopher v5.1.0. I have 3 TMT-6plex experiments, and I converted them from RAW to MzML. I tried two different approaches:

1) Followed the tutorial for multiple plexes on the website, which is basically the same as the Docs linked above.

We have a version that support MSstatsTMT better. We have a tutorial about it: https://docs.google.com/document/d/1TqO9WDI3k_1FTOI1dQYV4D4nf7C9TX7Xl9AzHxYNe84/edit

In short:

All went well, but the only msstats.csv I found was the one generated in the output folder, and it does not contain information per channel. The quantification seems to be pooled per TMT-plex. I also checked the "tmt-report" folder, but the data is already summarized to proteins, so it won't be compatible with MSstats. I also checked each TMT folder output but didn't find any msstats.csv in them.

When checking the MSstatsTMT HTML tutorial, little information is provided as to which files to use; the only information is to use "PhilosophertoMSstatsTMTFormat()", which leads me to the other test below.

2) Using "Philosopher" as the Intensity Extraction Tool.

I tried using Phisolopher, which should be compatible with MSstatsTMT. However, "Philosopher Abacus" crashed, so I followed the recommendation on #1324, which is to disable "Generate reports" and "Generate MSstats files". It then finished the search, but I am still missing msstats.csv compatible files.

Am I missing something? Or is TMT-6plex output not supported for MSstatsTMT?

I can provide the RAW (or MzML) if you need it.

fcyu commented 7 months ago

For the current version, you should use Philosopher as the intensity extraction tool to generate the MSstatsTMT compatiable msstats.csv. In the future, we will make it more robust to support both Philosopher and IonQuant.

I tried using Phisolopher, which should be compatible with MSstatsTMT. However, "Philosopher Abacus" crashed, so I followed the recommendation on https://github.com/Nesvilab/FragPipe/issues/1324, which is to disable "Generate reports" and "Generate MSstats files". It then finished the search, but I am still missing msstats.csv compatible files.

You need to enable "generate reports" and "generate MSstats files" to generate the TMT msstats.csv. Could you share the log which Abacus crashed?

Thanks,

Fengchao

luizalmeida93 commented 6 months ago

Hi Fengchao,

Thank you for such a speedy response.

I am attaching the log file. Please, let me know if you need anything else! log_2024-02-18_13-22-23.txt

Best, Luiz

fcyu commented 6 months ago

Hi Luiz,

Thanks for the log file. It looks like Abacus does not support TMT 6:

ERRO[13:22:23] unsupported number of labels      

I am afraid you have to wait for the future release.

Best,

Fengchao

luizalmeida93 commented 6 months ago

I see, that's ok, at least now I know I am not doing something wrong on my end.

I won't be able to use Fragpipe to generate an output of TMT 6-plex compatible with MSstatsTMT, but assuming I would use another tool for data analysis, and that I would want information at the peptide level, do you recommend switching extraction to IonQuant or keep with Philosopher but disable both the "generate reports" and "generate MSstats files"?

fcyu commented 6 months ago

If you use the TMT-Intetragor reports in the tmt-report folder, using IonQuant or Philosopher does not have much difference except that IonQuant is faster and supports raw file format.

And yes, you need to disable "generate reports" and "generate MSstats files".

Best,

Fengchao

luizalmeida93 commented 6 months ago

Hi Fengchao,

I was reading the log file and noticed that DIA-NN gets triggered even though I loaded DDA files and did not enable "Spectral library generation" or "Quant (DIA)". Is there any reason for it?

I have two additional questions unrelated to MSstatsTMT. I am listing them below, but I can move/create another issue if it works better for you. 1) I read the "Clip N-term M" description in MSfragger wiki, but it is still unclear to me. Is it removing all n-terminal methionine during in silico generation of the peptides? Is there any specific reason why I should uncheck it? 2) When Fragpipe runs out of memory, is there any way to estimate the required RAM? E.g., I have a dataset where I split the data into 25, which did not fix the problem. Only reducing max peptide size from 50 to 25 that solved the issue, but I could only solve it with trial and error.

Best, Luiz

fcyu commented 6 months ago

Hi Luiz,

I was reading the log file and noticed that DIA-NN gets triggered even though I loaded DDA files and did not enable "Spectral library generation" or "Quant (DIA)". Is there any reason for it?

I guess what you were looking at was MSBooster using DIA-NN spectral prediction module to predict and calculate identification scores. It is not about DIA.

I read the "Clip N-term M" description in MSfragger wiki, but it is still unclear to me. Is it removing all n-terminal methionine during in silico generation of the peptides? Is there any specific reason why I should uncheck it?

It considers both: with and without the N-terminal M. It is because of the biological process that most N-terminal M is clipped in vivo.

When Fragpipe runs out of memory, is there any way to estimate the required RAM? E.g., I have a dataset where I split the data into 25, which did not fix the problem. Only reducing max peptide size from 50 to 25 that solved the issue, but I could only solve it with trial and error.

Unfortunately, no. One trick is that you need to set the mass calibration to "None" if your search space is very big, because the first search of the mass calibration does not split the database.

Best,

Fengchao