This is quite rare case, but I thought that it is worth to let you know since the possible solution is relatively easy (if any modification is even needed since the input of colData should be DataFrame).
The problem is that when rownames of colData are in numeric ID format (i.e., "1000", "1001"...) and the input is in data.frame format, these rownames are not tested with colnames of assay. This can cause situation where wrong metadata is assigned to wrong sample. (The same things happns also for rowData.)
When data.frame is converted into DataFrame numeric rownames are dropped
library(S4Vectors)
# Create data.frames
df <- data.frame(matrix(1:9, nrow = 3))
df2 <- df
# By default the rowname is numeric ID
rownames(df)
[1] "1" "2" "3"
# Add rownames
rownames(df) <- paste0("row", 1:3)
rownames(df2) <- 1:3
# Show rownames
rownames(df)
[1] "row1" "row2" "row3"
rownames(df2)
[1] "1" "2" "3"
# Convert into DataFrame with rownames
df <- DataFrame(df, row.names = rownames(df))
df2 <- DataFrame(df2, row.names = rownames(df2))
# Show rownames (rownames are preserved)
rownames(df)
[1] "row1" "row2" "row3"
rownames(df2)
[1] "1" "2" "3"
# Convert into DataFrame without rownames
df <- DataFrame(df)
df2 <- DataFrame(df2)
# Show rownames (numeric rownames are dropped)
rownames(df)
[1] "row1" "row2" "row3"
> rownames(df2)
NULL
Here is an example that might occur when SummarizedExperiment is constructed
library(SummarizedExperiment)
# assay data
counts <- rbind(rep(0, 3), matrix(1:9, nrow = 3))
colnames(counts) <- paste0("sample", 1:3)
# col data
colData <- data.frame(Treatment=c("X", "X", "Y"),row.names=colnames(counts))
colData <- colData[c(2,3,1,4), , drop = FALSE ]
# col data
colData2 <- data.frame(Treatment=c("X", "X", "Y"), row.names = 1:3)
colData2 <- colData2[c(2,3,1,0), , drop = FALSE ]
# Does not work because the correspondance is checked
se <- SummarizedExperiment(assays=SimpleList(counts=counts), colData=colData)
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object:
nb of cols in 'assay' (3) must equal nb of rows in 'colData' (4)
# Works but the order samples is wrong between colData vs assay
se <- SummarizedExperiment(assays=SimpleList(counts=counts2), colData=colData2)
Hello,
@catiapacifico opened an issue to our framework that is utilizing SummarizedExperiment. https://github.com/microbiome/OMA/issues/202
This is quite rare case, but I thought that it is worth to let you know since the possible solution is relatively easy (if any modification is even needed since the input of colData should be DataFrame).
The problem is that when rownames of colData are in numeric ID format (i.e., "1000", "1001"...) and the input is in data.frame format, these rownames are not tested with colnames of assay. This can cause situation where wrong metadata is assigned to wrong sample. (The same things happns also for rowData.)
This is behaviour is caused by DataFrame conversion in lines https://github.com/Bioconductor/SummarizedExperiment/blob/8df97720354bdeecaf947f574dddd09da02d55b2/R/RangedSummarizedExperiment-class.R#L111 and https://github.com/Bioconductor/SummarizedExperiment/blob/8df97720354bdeecaf947f574dddd09da02d55b2/R/RangedSummarizedExperiment-class.R#L138 for colData and rowData respectively.
When data.frame is converted into DataFrame numeric rownames are dropped
Here is an example that might occur when SummarizedExperiment is constructed
Session info
R Under development (unstable) (2022-11-24 r83383) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Linux Mint 21 Matrix products: default BLAS: /opt/R/devel/lib/R/lib/libRblas.so LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=fi_FI.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=fi_FI.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C time zone: Europe/Helsinki tzcode source: system (glibc) attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] scater_1.27.2 scuttle_1.9.2 forcats_0.5.2 stringr_1.4.1 [5] dplyr_1.0.10 purrr_0.3.5 readr_2.1.3 tidyr_1.2.1 [9] tibble_3.1.8 tidyverse_1.3.2 tidySummarizedExperiment_1.9.2 patchwork_1.1.2 [13] miaViz_1.7.0 ggraph_2.1.0 ggplot2_3.4.0 mia_1.5.17 [17] testthat_3.1.5 MultiAssayExperiment_1.25.1 TreeSummarizedExperiment_2.7.0 Biostrings_2.67.0 [21] XVector_0.39.0 SingleCellExperiment_1.21.0 SummarizedExperiment_1.29.1 Biobase_2.59.0 [25] GenomicRanges_1.51.1 GenomeInfoDb_1.35.5 IRanges_2.33.0 S4Vectors_0.37.0 [29] BiocGenerics_0.45.0 MatrixGenerics_1.11.0 matrixStats_0.63.0 loaded via a namespace (and not attached): [1] splines_4.3.0 later_1.3.0 bitops_1.0-7 ggplotify_0.1.0 cellranger_1.1.0 [6] polyclip_1.10-4 reprex_2.0.2 DirichletMultinomial_1.41.0 lifecycle_1.0.3 rprojroot_2.0.3 [11] processx_3.8.0 lattice_0.20-45 MASS_7.3-58.1 backports_1.4.1 magrittr_2.0.3 [16] plotly_4.10.1 remotes_2.4.2 httpuv_1.6.6 sessioninfo_1.2.2 pkgbuild_1.3.1 [21] DBI_1.1.3 lubridate_1.9.0 pkgload_1.3.2 zlibbioc_1.45.0 rvest_1.0.3 [26] RCurl_1.98-1.9 yulab.utils_0.0.5 tweenr_2.0.2 GenomeInfoDbData_1.2.9 ggrepel_0.9.2 [31] irlba_2.3.5.1 tidytree_0.4.1 vegan_2.6-4 permute_0.9-7 DelayedMatrixStats_1.21.0 [36] codetools_0.2-18 DelayedArray_0.25.0 xml2_1.3.3 ggforce_0.4.1 tidyselect_1.2.0 [41] aplot_0.1.9 farver_2.1.1 ScaledMatrix_1.7.0 viridis_0.6.2 googledrive_2.0.0 [46] jsonlite_1.8.3 BiocNeighbors_1.17.1 decontam_1.19.0 ellipsis_0.3.2 tidygraph_1.2.2 [51] tools_4.3.0 ggnewscale_0.4.8 treeio_1.23.0 Rcpp_1.0.9 glue_1.6.2 [56] gridExtra_2.3 mgcv_1.8-41 usethis_2.1.6 withr_2.5.0 fastmap_1.1.0 [61] fansi_1.0.3 callr_3.7.3 digest_0.6.30 rsvd_1.0.5 timechange_0.1.1 [66] R6_2.5.1 mime_0.12 gridGraphics_0.5-1 colorspace_2.0-3 RSQLite_2.2.19 [71] googlesheets4_1.0.1 utf8_1.2.2 generics_0.1.3 data.table_1.14.6 DECIPHER_2.27.0 [76] prettyunits_1.1.1 graphlayouts_0.8.4 httr_1.4.4 htmlwidgets_1.5.4 pkgconfig_2.0.3 [81] gtable_0.3.1 blob_1.2.3 brio_1.1.3 htmltools_0.5.3 profvis_0.3.7 [86] scales_1.2.1 ggfun_0.0.9 rstudioapi_0.14 tzdb_0.3.0 reshape2_1.4.4 [91] nlme_3.1-160 cachem_1.0.6 parallel_4.3.0 miniUI_0.1.1.1 vipor_0.4.5 [96] desc_1.4.2 pillar_1.8.1 grid_4.3.0 vctrs_0.5.1 urlchecker_1.0.1 [101] promises_1.2.0.1 BiocSingular_1.15.0 dbplyr_2.2.1 beachmat_2.15.0 xtable_1.8-4 [106] cluster_2.1.4 beeswarm_0.4.0 cli_3.4.1 compiler_4.3.0 rlang_1.0.6 [111] crayon_1.5.2 modelr_0.1.10 ps_1.7.2 plyr_1.8.8 fs_1.5.2 [116] ggbeeswarm_0.6.0 stringi_1.7.8 viridisLite_0.4.1 BiocParallel_1.33.6 assertthat_0.2.1 [121] munsell_0.5.0 lazyeval_0.2.2 devtools_2.4.5 Matrix_1.5-3 hms_1.1.2 [126] sparseMatrixStats_1.11.0 bit64_4.0.5 shiny_1.7.3 haven_2.5.1 gargle_1.2.1 [131] igraph_1.3.5 broom_1.0.1 memoise_2.0.1 ggtree_3.7.1 bit_4.0.5 [136] readxl_1.4.1 ape_5.6-2-Tuomas