Vitek-Lab / MSstats

R package - MSstats
74 stars 46 forks source link

the values of pvalue and adj.pvalue are NaN in DDA2009.comparisons[["ComparisonResult"]] #83

Closed Douerww closed 2 years ago

Douerww commented 2 years ago

data from here, codes as follow:

fileData <- read.csv('out_msstats.csv')
DDA2009.proposed <- MSstats::dataProcess(raw = fileData,
                                         normalization = 'equalizeMedians',
                                         summaryMethod = 'linear',  
                                         censoredInt = "NA",
                                         MBimpute = TRUE)

len <- length(levels(DDA2009.proposed$FeatureLevelData$GROUP))
tmp <- t(combn(len,2))
matrix_len = length(t(combn(len,2))) / 2
ourMatrix <- matrix(c(0:0),nrow=matrix_len,ncol=len)
for(i in 1:matrix_len){
  ourMatrix[i, tmp[i]] = -1
  ourMatrix[i, tmp[i + matrix_len]] = 1
}
ourCondition <- levels(DDA2009.proposed$ProteinLevelData$GROUP)
tmp_name <- matrix(ourCondition, nr=len, nc=1)
name <- matrix(nr=matrix_len, nc=1)
for(i in 1:matrix_len){
  name[i,1] <- sprintf('%s-%s', tmp_name[tmp[i+matrix_len]], tmp_name[tmp[i]])
}
row.names(ourMatrix) <- name
colnames(ourMatrix) <- ourCondition

DDA2009.comparisons <- MSstats::groupComparison(contrast.matrix = ourMatrix,
                                                data = DDA2009.proposed)

In DDA2009.comparisons[["ComparisonResult"]], all values with statistical significance except log2FC are null. I would like to know what could be causing this problem, any idea will help me, thanks!

PS: I used the summaryMethod = 'linear' parameter when calling dataProcess() because the choice of 'TMP' parameters will report an error, and this does not happen with other data using 'linear' parameter.

mstaniak commented 2 years ago

I will check this, however, linear is not the recommended summarization method. TMP should work just fine, could you please include also code and error that occurs with TMP?

Douerww commented 2 years ago

ok, and the codes and errors like this: image

mstaniak commented 2 years ago

Please let me know if installing these dev versions of MSstats packages:

devtools::install_github("Vitek-Lab/MSstatsConvert", ref = "hotfix-techreplicate")
devtools::install_github("Vitek-Lab/MSstats", ref = "hotfix-fractions-check")

fixes the issue.

Douerww commented 2 years ago

Still an error, will it be related to my some packages without updating the latest version? image image image

Error in MSstatsMergeFractions(input) : 找不到对象'match_runs'
mstaniak commented 2 years ago

It was a problem within our code, not dependencies. I need to double check the solution, will let you know if it works ASAP

Douerww commented 2 years ago

ok, thanks!

mstaniak commented 2 years ago

It was actually a reporting problem. The correct behavior is that the function should stop before the summarization step, but we didn't catch/report the error correctly. Your dataset has technical replicates. Please add a TechReplicate column to indicate technical replicates. The problem should disappear. Before you proceed, please re-install MSstats from the branch that I suggested earlier, I'm about to push an update there.

Douerww commented 2 years ago

After reinstalling the MSstats from the branch and adding the TechReplicate column (data from here), I still get the following error:

> DDA2009.proposed <- MSstats::dataProcess(raw = fileData,
+                                          normalization = 'equalizeMedians',
+                                          summaryMethod = 'TMP',
+                                          #summaryMethod = 'linear',  
+                                          censoredInt = "NA",
+                                          MBimpute = TRUE)
INFO  [2022-02-06 18:38:39] ** Multiple fractionations exist: 24 fractionations per MS replicate.
INFO  [2022-02-06 18:39:52] ** Features with one or two measurements across runs are removed.
INFO  [2022-02-06 18:39:52] ** Fractionation handled.
INFO  [2022-02-06 18:39:52] ** Updated quantification data to make balanced design. Missing values are marked by NA
Error in MSstatsMergeFractions(input) : 
  *** error : can't figure out which multiple runs come from the same sample.
> # If run dataProcess() occuring an error message, please change "summaryMethod = 'TMP'" to "summaryMethod = 'linear'"
> DDA2009.proposed <- MSstats::dataProcess(raw = fileData,
+                                          normalization = 'equalizeMedians',
+                                          summaryMethod = 'linear',  
+                                          censoredInt = "NA",
+                                          MBimpute = TRUE)
INFO  [2022-02-06 23:16:04] ** Multiple fractionations exist: 24 fractionations per MS replicate.
INFO  [2022-02-06 23:17:20] ** Features with one or two measurements across runs are removed.
INFO  [2022-02-06 23:17:20] ** Fractionation handled.
INFO  [2022-02-06 23:17:20] ** Updated quantification data to make balanced design. Missing values are marked by NA
Error in MSstatsMergeFractions(input) : 
  *** error : can't figure out which multiple runs come from the same sample.
mstaniak commented 2 years ago

hi, latest update to the hotfix-fractions-check branch should fix this problem, thanks for your patience and providing data for testing.

Douerww commented 2 years ago

it works! thank you!

Douerww commented 2 years ago

Hi, I have found a new problem in testing CSV files (LFQ and TMT types) that can run smoothly before reporting an error, as follows:

Warning: Error in [.data.table: column(s) not found: TECHREPLICATE

Should it be decided that the CSV contains a [Fraction] column so that it needs a [TechReplicate] column?

mstaniak commented 2 years ago

definitely not for TMT data, because we handle fractionation differently there, for LFQ only if there are actually technical replicates in the data. Can I see a full traceback() or example data snippet?

mstaniak commented 2 years ago

OK, that may not be necessary. Please re-install from hotfix-fractions-check branch again. TechReplicate should be optional again

Douerww commented 2 years ago

it works, thanks!