Vitek-Lab / MSstatsPTM

Post Translational Modification (PTM) Significance Analysis in shotgun mass spectrometry-based proteomic experiments
https://vitek-lab.github.io/MSstatsPTM/
Artistic License 2.0
8 stars 2 forks source link

Error occured in dataSummarizationPTM_TMT step. #28

Closed ghost closed 2 years ago

ghost commented 2 years ago

Hi MSstatsPTM developer, We are very excited about PTM analysis capability in MSstatsPTM package, and want to add it to our analysis workflow. I am getting an error when running MSstatsPTM::dataSummarizationPTM_TMT(). Could you provide some suggestions?

Errors:

>quant.mstats = MSstatsPTM::dataSummarizationPTM_TMT(input.tmt) Error in checkHT(n, dx <- dim(x)) : invalid 'n' - must contain at least one non-missing element, got none.

I wonder if you know where this error message came from.

Could you point me to the R script and the code sections for this error? or Is there possible causes you can think off for this error? I wonder if there is any mistake I made when preparing the input data.


Here is more details

The data I used was from Proteome Discoverer, so I used the MSstatsTMT::PDtoMSstatsTMTFormat() data conversion functions with a little adjustment to create the PTM data.table as suggested in MSstatsPTM_TMT_Workflow.html

PTM data.table looks like below (I used some random string for protein names and modified peptide sequence )

ProteinName                PeptideSequence Charge                              PSM Mixture TechRepMixture Run Channel
1   ABCDE_T11          [-].mAACCCAGSGtPR.[E]      2          [-].mAACCCAGSGtPR.[E]_2       1              1 1_1     126
2   ABCDE_T11          [-].mAACCCAGSGtPR.[E]      2          [-].mAACCCAGSGtPR.[E]_2       1              1 1_1    127C
45      FGHIJK                 [-].mADDNk.[G]      2                 [-].mADDNk.[G]_2       1              1 1_1    132N
46      FGHIJK                 [-].mADDNk.[G]      2                 [-].mADDNk.[G]_2       1              1 1_1    133C
49  ZZZZZ_S74       [-].mAACCCCCCCPPLsPk.[S]      3       [-].mAACCCCCCCPPLsPk.[S]_3       1              1 1_1     126
50  ZZZZZ_S74       [-].mAACCCCCCCPPLsPk.[S]      3       [-].mAACCCCCCCPPLsPk.[S]_3       1              1 1_1    127C
89   PROTEIN_S5 [-].mAACsCCCCCCCDDDDDDDDSk.[S]      2 [-].mAACsCCCCCCCDDDDDDDDSk.[S]_2       1              1 1_1    130N
90   PROTEIN_S5 [-].mAACsCCCCCCCDDDDDDDDSk.[S]      2 [-].mAACsCCCCCCCDDDDDDDDSk.[S]_2       1              1 1_1    131C

    BioReplicate   Condition Intensity
1   Condition_P0_1  Condition_P0      83.0
2   Condition_P4_1  Condition_P4      98.5
45      Condition_R4_2      Condition_R4     118.3
46     Condition_R10_1     Condition_R10     101.9
51  Condition_P0_2  Condition_P0     131.1
89 Parental_10_2 Parental_10      50.9
90      Condition_R4_1      Condition_R4      32.3

The PROTEIN data.table is the table directly from MSstatsTMT::PDtoMSstatsTMTFormat()

Here is the data generation workflow The whole proteome experiment -> Proteome Discoverer -> export PSM file -> MSstatTMT::PDtoMSstatsTMTFormat() -> MSstatsPTM The PTM enrichement experiment -> Proteome Discoverer -> export PSM file -> MSstatTMT::PDtoMSstatsTMTFormat() -> MSstatsPTM

Combine PTM and PROTEIN: input.tmt <- list(PTM= ptm.data.table, PROTEIN = protein.data.table)

Is there something you think might be the reason for checkHT error?

Thank you!

devonjkohler commented 2 years ago

Hi @maohi

Thank you for your interest in MSstatsPTM! The processing you did on the data so far looks correct from what I can see. It is hard to diagnose the problem because I'm not sure where the checkHT error is coming from. Would you be able to provide some small/example data that creates the same issue?

Devon

devonjkohler commented 2 years ago

Hi @maohi,

After some additional testing I was actually able to create and fix this bug. There was a change in a dependency that caused one of the conversions to data.table to fail. I have implemented and pushed the bug fix to both github and Bioconductor. The Bioconductor fix will take a day or two to propagate. In the meantime feel free to install directly from the github master branch!

Best, Devon

ghost commented 2 years ago

Hi @devonjkohler Thank you so much for the fix. I will try it again.

Han-Yin

ghost commented 2 years ago

Hi @devonjkohler, Really appreciate your help. I downloaded the version 1.5.2 from GitHub directly, but I am still getting the same error

> quant.mstats = MSstatsPTM::dataSummarizationPTM_TMT(input.tmt) Error in checkHT(n, dx <- dim(x)) : invalid 'n' - must contain at least one non-missing element, got none.

I will try to make a test case using published data sets later, but wondering if you know which dependency (package or script) throw that error message. I want to double check if I also updated that dependency. Could you point that out to me?

Thank you! Best, Han-Yin

ghost commented 2 years ago

Hi @devonjkohler I think I know where the error occurred.

The resulting object from MSstatsConvert is a data.frame of class MSstatsValidated which can not be converted to data.table using data.table::as.data.table(), and as.data.table() is called in data conversion functions in MSstatsPTM

Example:

>ptm.data = MSstatTMT::PDtoMSstatsTMTFormat(input_filename) >as.data.table(ptm.data) Error in checkHT(n, dx <- dim(x)) : invalid 'n' - must contain at least one non-missing element, got none.

What I end up doing is to convert data.frame of class MSstatsValidated to just base::data.frame before passing the data to MSstatsPTM::dataSummarizationPTM_TMT()

Example:

>pro.data = MSstatTMT::PDtoMSstatsTMTFormat(pd_pro_input_filename) > pro.df = pro.data %>% data.frame() >ptm.data = MSstatTMT::PDtoMSstatsTMTFormat(pd_ptm_inputfilename) ... (some refomatting steps to add modification information in ProteinName)_ >ptm.df = ptm.data %>% data.frame() > input.tmt <-list(PTM = ptm.df, PROTEIN = pro.df) >MSstatsPTM::dataSummarizationPTM_TMT(input.tmt)

The dataSummarizationPTM_TMT step is done without error. I will use this result to try next step.

Best, Han-Yin