MarioniLab / miloR

R package implementation of Milo for testing for differential abundance in KNN graphs
https://bioconductor.org/packages/release/bioc/html/miloR.html
GNU General Public License v3.0
339 stars 22 forks source link

Error in S4Vectors #348

Open avfentor opened 1 month ago

avfentor commented 1 month ago

Describe the bug Hi there, I'm having an issue at the "rownames(seurat_milo) <- seurat_milo$mouse_id" (mouse_id from own dataset) step of the Differential abundance testing with Milo vignette. This is the step right before da_results object step. rownames(seurat_milo) and seurat_milo$mouse_id are different lengths, 24137 and 28910, respectively. In other comparisons when the number differences were the other way around, as in there were more row names, the function worked fine. Could you help me how to solve this issue?

Minimum code example

add subset of condition

seurat_hypox <- seurat.filtered.clusters %>% subset(subset = Condition == 'hypoxia')

Create a Milo object from Seurat object

seurat_sce_hypox <- as.SingleCellExperiment(seurat_hypox) #transform seurat object to a SingleCellExperiment object seurat_milo <- Milo(seurat_sce_hypox) reducedDim(seurat_milo, "UMAP") <- reducedDim(seurat_sce_hypox, "UMAP")

Construct KNN graph

seurat_milo <- buildGraph(seurat_milo, k=48, d=30)

Define representative neighborhoods

seurat_milo <- makeNhoods(seurat_milo, prop=0.1, k=48, d=30, refined=TRUE)

start_plot("DA_NHood_Histogram") plotNhoodSizeHist(seurat_milo) end_plot()

Counting cells in neighborhoods

seurat_milo <- countCells(seurat_milo, meta.data = data.frame(colData(seurat_milo)), samples = "mouse_id")

Differential abudance testing

traj_design <- data.frame(colData(seurat_milo))[,c("mouse_id", "genotypeshort")] traj_design <- distinct(traj_design) rownames(traj_design) <- traj_design$mouse_id

Reorder rownames to match columns of nhoodCounts(milo)

traj_design <- traj_design[colnames(nhoodCounts(seurat_milo)), , drop=FALSE] traj_design$genotypeshort <- relevel(as.factor(traj_design$genotypeshort), "wt")

seurat_milo <- calcNhoodDistance(seurat_milo, d=30) rownames(seurat_milo) <- seurat_milo$mouse_id da_results <- testNhoods(seurat_milo, design = ~ genotypeshort, design.df = traj_design)

Full error traceback Error in S4Vectors:::normarg_names(value, class(x), length(x)) : attempt to set too many names (28910) on PartitioningByEnd object of length 24137 11. stop(wmsg("attempt to set too many names (", names_len, ") ", "on ", x_class, " object of length ", x_len)) at normarg-utils.R#247 10. S4Vectors:::normarg_names(value, class(x), length(x)) at IRanges-class.R#304 9. names<-(*tmp*, value = value) 8. names<-(*tmp*, value = value) at CompressedList-class.R#96 7. METHOD(x, value) at GenomicRangesList-class.R#170 6. names<-(*tmp*, value = value[[1]]) 5. names<-(*tmp*, value = value[[1]]) at RangedSummarizedExperiment-class.R#275 4. dimnames<-(*tmp*, value = dn) 3. dimnames<-(*tmp*, value = dn) 2. rownames<-(*tmp*, value = c(459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, ... 1. rownames<-(*tmp*, value = c(459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, 459L, ...

Session info R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin23.4.0 Running under: macOS Sonoma 14.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /opt/homebrew/Cellar/r/4.4.1/lib/R/lib/libRlapack.dylib; LAPACK version 3.12.0

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices datasets utils methods base

other attached packages: [1] SCpubr_2.0.2 miloR_2.0.0 edgeR_4.2.1
[4] limma_3.60.4 clustree_0.5.1 ggraph_2.2.1
[7] loupeR_1.1.1 patchwork_1.2.0 scater_1.32.1
[10] scuttle_1.14.0 ggtext_0.1.2 conflicted_1.2.0
[13] DESeq2_1.44.0 aggregateBioVar_1.6.0 DropletUtils_1.16.0
[16] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0 Biobase_2.64.0
[19] GenomicRanges_1.56.1 GenomeInfoDb_1.40.1 IRanges_2.38.1
[22] S4Vectors_0.42.1 BiocGenerics_0.50.0 MatrixGenerics_1.16.0
[25] matrixStats_1.3.0 SoupX_1.6.2 scDblFinder_1.10.0
[28] tidyr_1.3.1 stringr_1.5.1 BiocParallel_1.30.4
[31] SCINA_1.2.0 gplots_3.1.3.1 MASS_7.3-61
[34] readr_2.1.5 ggplot2_3.5.1 magrittr_2.0.3
[37] tibble_3.2.1 purrr_1.0.2 cowplot_1.1.3
[40] SeuratObject_4.1.4 Seurat_4.4.0 dplyr_1.1.4

loaded via a namespace (and not attached): [1] spatstat.sparse_3.1-0 bitops_1.0-8 httr_1.4.7
[4] RColorBrewer_1.1-3 numDeriv_2016.8-1.1 backports_1.5.0
[7] tools_4.4.1 sctransform_0.4.1 utf8_1.2.4
[10] R6_2.5.1 HDF5Array_1.32.0 lazyeval_0.2.2
[13] uwot_0.2.2 rhdf5filters_1.16.0 withr_3.0.1
[16] sp_2.1-4 gridExtra_2.3 progressr_0.14.0
[19] cli_3.6.3 formatR_1.14 spatstat.explore_3.3-1
[22] labeling_0.4.3 spatstat.data_3.1-2 ggridges_0.5.6
[25] pbapply_1.7-2 Rsamtools_2.20.0 R.utils_2.12.3
[28] parallelly_1.38.0 rstudioapi_0.16.0 generics_0.1.3
[31] BiocIO_1.14.0 vroom_1.6.5 gtools_3.9.5
[34] ica_1.0-3 spatstat.random_3.3-1 futile.logger_1.4.3
[37] Matrix_1.7-0 ggbeeswarm_0.7.2 fansi_1.0.6
[40] abind_1.4-5 R.methodsS3_1.8.2 lifecycle_1.0.4
[43] yaml_2.3.10 rhdf5_2.48.0 SparseArray_1.4.8
[46] Rtsne_0.17 grid_4.4.1 promises_1.3.0
[49] dqrng_0.4.1 crayon_1.5.3 miniUI_0.1.1.1
[52] lattice_0.22-6 beachmat_2.20.0 pillar_1.9.0
[55] knitr_1.48 metapod_1.12.0 rjson_0.2.21
[58] xgboost_1.7.8.1 future.apply_1.11.2 codetools_0.2-20
[61] leiden_0.4.3.1 glue_1.7.0 spatstat.univar_3.0-0
[64] data.table_1.15.4 vctrs_0.6.5 png_0.1-8
[67] gtable_0.3.5 cachem_1.1.0 xfun_0.46
[70] S4Arrays_1.4.1 mime_0.12 tidygraph_1.3.1
[73] survival_3.7-0 statmod_1.5.0 bluster_1.14.0
[76] fitdistrplus_1.2-1 ROCR_1.0-11 nlme_3.1-165
[79] bit64_4.0.5 RcppAnnoy_0.0.22 irlba_2.3.5.1
[82] vipor_0.4.7 KernSmooth_2.23-24 colorspace_2.1-1
[85] ggrastr_1.0.2 tidyselect_1.2.1 bit_4.0.5
[88] compiler_4.4.1 curl_5.2.1 BiocNeighbors_1.22.0
[91] hdf5r_1.3.11 xml2_1.3.6 DelayedArray_0.30.1
[94] plotly_4.10.4 rtracklayer_1.64.0 checkmate_2.3.2
[97] scales_1.3.0 caTools_1.18.2 lmtest_0.9-40
[100] digest_0.6.36 goftest_1.2-3 spatstat.utils_3.0-5
[103] XVector_0.44.0 htmltools_0.5.8.1 pkgconfig_2.0.3
[106] sparseMatrixStats_1.16.0 fastmap_1.2.0 rlang_1.1.4
[109] htmlwidgets_1.6.4 UCSC.utils_1.0.0 shiny_1.9.1
[112] DelayedMatrixStats_1.26.0 farver_2.1.2 zoo_1.8-12
[115] jsonlite_1.8.8 R.oo_1.26.0 BiocSingular_1.20.0
[118] RCurl_1.98-1.16 GenomeInfoDbData_1.2.12 Rhdf5lib_1.26.0
[121] munsell_0.5.1 Rcpp_1.0.13 viridis_0.6.5
[124] reticulate_1.38.0 stringi_1.8.4 zlibbioc_1.50.0
[127] plyr_1.8.9 parallel_4.4.1 listenv_0.9.1
[130] ggrepel_0.9.5 deldir_2.0-4 graphlayouts_1.1.1
[133] Biostrings_2.72.1 splines_4.4.1 gridtext_0.1.5
[136] tensor_1.5 hms_1.1.3 locfit_1.5-9.10
[139] igraph_2.0.3 spatstat.geom_3.3-2 reshape2_1.4.4
[142] ScaledMatrix_1.12.0 futile.options_1.0.1 XML_3.99-0.17
[145] lambda.r_1.2.4 scran_1.32.0 renv_1.0.7
[148] BiocManager_1.30.23 tweenr_2.0.3 tzdb_0.4.0
[151] httpuv_1.6.15 RANN_2.6.1 polyclip_1.10-7
[154] future_1.34.0 scattermore_1.2 ggforce_0.4.2
[157] rsvd_1.0.5 xtable_1.8-4 restfulr_0.0.15
[160] later_1.3.2 viridisLite_0.4.2 memoise_2.0.1
[163] beeswarm_0.4.0 GenomicAlignments_1.40.0 cluster_2.1.6
[166] globals_0.16.3

MikeDMorgan commented 3 days ago

@avfentor it looks like you are trying to set the rownames of the Milo object using the colData() variable mouse_id, which also matches your "sample_id". The rownames of a Milo object are the genes not the cell barcodes or the experimental sample IDs.