During the standard analysis, filter_missval() silently sorts the rowData alphabetically. If we skip this step, this results in a rowData where the rownames and the name column contain the same information, but add_rejections() later sorts on the name column without reordering rownames, getting values that do not match. This creates problems later on when functions such as single_plot() use subset(), which work on rownames, then tries to use the name column.
For example, with the steps from ?add_rejections:
# Load example
data <- UbiLength
data <- data[data$Reverse != "+" & data$Potential.contaminant != "+",]
data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";")
# Make SummarizedExperiment
columns <- grep("LFQ.", colnames(data_unique))
exp_design <- UbiLength_ExpDesign
se <- make_se(data_unique, columns, exp_design)
# Filter, normalize and impute missing values
# filt <- filter_missval(se, thr = 0) # <- no filtering step: no sorting!
norm <- normalize_vsn(se)
imputed <- impute(norm, fun = "MinProb", q = 0.01)
# Test for differentially expressed proteins
diff <- test_diff(imputed, "control", "Ctrl")
dep <- add_rejections(diff, alpha = 0.05, lfc = 1)
Then the resulting object has wrong names:
SummarizedExperiment::rowData(dep, use.names = TRUE) |>
as.data.frame() |> head() |> dplyr::select(name, ID)
#> name ID
#> RBM47 A7E2Y1-3 A7E2Y1-3
#> UBA6 AAAS Q9NRG9
#> ILVBL AAMDC Q9H7C9-3
#> FAM92A1 AAR2 Q9Y312
#> ACO2 ABCB6 H7BXK9
#> RPRD1B ABCB7 O75027-3
plot_single(dep, proteins = c("A7E2Y1-3","AAAS"))
#> Warning: 6 parsing failures.
#> row col expected actual
#> 1 -- value in level set ACTR3
#> 2 -- value in level set ACTR3
#> 3 -- value in level set ACTR3
#> 4 -- value in level set TUFM
#> 5 -- value in level set TUFM
#> ... ... .................. ......
#> See problems(...) for more details.
For users: always use filter_missval(), even with a high threshold.
During the standard analysis,
filter_missval()
silently sorts therowData
alphabetically. If we skip this step, this results in arowData
where therownames
and thename
column contain the same information, butadd_rejections()
later sorts on thename
column without reorderingrownames
, getting values that do not match. This creates problems later on when functions such assingle_plot()
usesubset()
, which work on rownames, then tries to use thename
column.For example, with the steps from
?add_rejections
:Then the resulting object has wrong names:
For users: always use
filter_missval()
, even with a high threshold.