MarioniLab / miloR

R package implementation of Milo for testing for differential abundance in KNN graphs
https://bioconductor.org/packages/release/bioc/html/miloR.html
GNU General Public License v3.0
342 stars 22 forks source link

Getting error in buildNhoodGraph #350

Open sepehrgolriz opened 2 weeks ago

sepehrgolriz commented 2 weeks ago

Hi,

I am getting an error while using the buildNhoodGraph function. The error is

Error in if (any(i < 0L)) { : missing value where TRUE/FALSE needed In addition: Warning message: In int2i(as.integer(i), n) : NAs introduced by coercion to integer range

However, there is no NA in the object: any(is.na(reducedDim(traj_milo))) [1] FALSE

These are the properties of the object:

dim(traj_milo) [1] 12136 732835

nhoods_matrix <- nhoods(traj_milo) dim(nhoods_matrix) [1] 732835 99589

summary(colSums(nhoods_matrix)) Min. 1st Qu. Median Mean 3rd Qu. Max. 12.00 16.00 22.00 27.64 33.00 325.00

summary(reducedDim(traj_milo)) PC_1 PC_2 PC_3 PC_4 PC_5
Min. :-38.27643 Min. :-11.62183 Min. :-22.55284 Min. :-33.51378 Min. :-110.40824
1st Qu.:-15.33842 1st Qu.: -2.72014 1st Qu.: -4.36499 1st Qu.: -0.94730 1st Qu.: -0.03238
Median : 9.05430 Median : -1.07223 Median : 1.78047 Median : 0.18979 Median : 0.54264
Mean : 0.00731 Mean : -0.02878 Mean : 0.02077 Mean : -0.00495 Mean : 0.16521
3rd Qu.: 10.48666 3rd Qu.: -0.12177 3rd Qu.: 5.35377 3rd Qu.: 1.78425 3rd Qu.: 1.11916
Max. : 13.95924 Max. : 35.70086 Max. : 10.20913 Max. : 14.10031 Max. : 5.00817

all(colnames(nhoodCounts(traj_milo)) %in% rownames(thy.design)) [1] TRUE

Could anyone help me with this? The size of the dataset could be a problem?

Thanks a lot for your kind help in advance.

sepehrgolriz commented 2 weeks ago

Hi,

Even with smaller dataset I get the same error:

small_traj_milo <- traj_milo[1:1000, ]

small_traj_milo <- buildNhoodGraph(small_traj_milo) Error in if (any(i < 0L)) { : missing value where TRUE/FALSE needed In addition: Warning message: In int2i(as.integer(i), n) : NAs introduced by coercion to integer range

Any help would be appreciated!

MikeDMorgan commented 2 weeks ago

Always provide the output of sessionInfo() and the code leading up to the point of the error.

sepehrgolriz commented 2 weeks ago

Yes Sir. Sorry for my bad!

Here is the the code:

combined_seurat_obj <- NormalizeData(combined_seurat_obj) combined_seurat_obj <- FindVariableFeatures(combined_seurat_obj) combined_seurat_obj <- ScaleData(combined_seurat_obj) combined_seurat_obj <- RunPCA(combined_seurat_obj)

combined_seurat_obj <- RunUMAP(combined_seurat_obj)

thy.sce <- as.SingleCellExperiment(combined_seurat_obj) thy.sce <- logNormCounts(thy.sce) thy.sce <- runUMAP(thy.sce) traj_milo <- Milo(thy.sce) reducedDim(traj_milo, "UMAP") <- reducedDim(thy.sce, "UMAP")

traj_milo <- buildGraph(traj_milo, k = 4, d = 40) traj_milo <- makeNhoods(traj_milo, prop = 0.2, k = 11, d=40, refined = TRUE, refinement_scheme="graph") # make nhoods using graph-only as this is faster traj_milo <- countCells(traj_milo, meta.data = data.frame(colData(traj_milo)), samples="orig.ident") plotNhoodSizeHist(traj_milo)

thy.design <- data.frame(colData(traj_milo))[,c("orig.ident", "disease_status")] thy.design <- distinct(thy.design) rownames(thy.design) <- thy.design$orig.ident

thy.design <- thy.design[colnames(nhoodCounts(traj_milo)), , drop=FALSE] table(thy.design$disease_status)

contrast.1 <- "disease_statusYes"

thy.design$disease_status <- factor(thy.design$disease_status, levels = c("No", "Yes"))

da_results <- testNhoods(traj_milo, design = ~ disease_status, design.df = thy.design, model.contrasts = contrast.1, fdr.weighting = "graph-overlap", norm.method = "TMM")

small_traj_milo <- traj_milo[1:1000, ] small_traj_milo <- buildNhoodGraph(small_traj_milo)

traj_milo <- buildNhoodGraph(traj_milo)

##################### anyNA(traj_milo@nhoods) [1] FALSE

is.null(traj_milo@nhoodGraph) [1] FALSE head(which(is.na(as.integer(rownames(traj_milo@nhoods))))) [1] 1 2 3 4 5 6 Warning message: In which(is.na(as.integer(rownames(traj_milo@nhoods)))) : NAs introduced by coercion head(which(is.na(as.integer(colnames(traj_milo@nhoods))))) integer(0) head(rownames(traj_milo@nhoods)) [1] "AAACAAGCAAGCGAAGACTTTAGG-1_1" "AAACAAGCACATTGTCACTTTAGG-1_1" "AAACAAGCACCTGGTCACTTTAGG-1_1" [4] "AAACAAGCAGGAATAAACTTTAGG-1_1" "AAACAAGCATCAAGTGACTTTAGG-1_1" "AAACAAGCATTACTCAACTTTAGG-1_1" is.null(traj_milo@nhoodGraph) [1] FALSE str(traj_milo@nhoodGraph) list() dim(traj_milo@nhoods) [1] 732835 99589

#######################

Here is the sessionInfo

sessionInfo() R version 4.4.0 (2024-04-24) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 22.04.4 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Etc/UTC tzcode source: system (glibc)

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] miloR_2.0.0 edgeR_4.2.2 scater_1.32.1 scran_1.32.0
[5] scuttle_1.14.0 BiocParallel_1.38.0 SingleCellExperiment_1.26.0 RColorBrewer_1.1-3
[9] DESeq2_1.44.0 SummarizedExperiment_1.34.0 Biobase_2.64.0 MatrixGenerics_1.16.0
[13] matrixStats_1.4.1 GenomicRanges_1.56.2 GenomeInfoDb_1.40.1 IRanges_2.38.1
[17] S4Vectors_0.42.1 BiocGenerics_0.50.0 sctransform_0.4.1 purrr_1.0.2
[21] muscat_1.18.0 limma_3.60.6 ggplot2_3.5.1 dplyr_1.1.4
[25] Azimuth_0.5.0 shinyBS_0.61.1 SeuratWrappers_0.3.5 Seurat_5.1.0
[29] SeuratObject_5.0.2 sp_2.1-4

loaded via a namespace (and not attached): [1] R.methodsS3_1.8.2 progress_1.2.3 poweRlaw_0.80.0
[4] goftest_1.2-3 DT_0.33 Biostrings_2.72.1
[7] vctrs_0.6.5 spatstat.random_3.3-2 pbmcref.SeuratData_1.0.0
[10] corpcor_1.6.10 digest_0.6.37 png_0.1-8
[13] shape_1.4.6.1 ggrepel_0.9.6 deldir_2.0-4
[16] parallelly_1.38.0 MASS_7.3-60.2 Signac_1.14.0
[19] reshape2_1.4.4 httpuv_1.6.15 foreach_1.5.2
[22] withr_3.0.1 survival_3.6-4 EnsDb.Hsapiens.v86_2.99.0
[25] memoise_2.0.1 ggbeeswarm_0.7.2 zoo_1.8-12
[28] GlobalOptions_0.1.2 gtools_3.9.5 pbapply_1.7-2
[31] R.oo_1.26.0 prettyunits_1.2.0 KEGGREST_1.44.1
[34] promises_1.3.0 httr_1.4.7 restfulr_0.0.15
[37] globals_0.16.3 fitdistrplus_1.2-1 rhdf5filters_1.16.0
[40] rhdf5_2.48.0 rstudioapi_0.16.0 UCSC.utils_1.0.0
[43] miniUI_0.1.1.1 generics_0.1.3 curl_5.2.3
[46] zlibbioc_1.50.0 ggraph_2.2.1 ScaledMatrix_1.12.0
[49] polyclip_1.10-7 GenomeInfoDbData_1.2.12 SparseArray_1.4.8
[52] xtable_1.8-4 stringr_1.5.1 pracma_2.4.4
[55] doParallel_1.0.17 S4Arrays_1.4.1 hms_1.1.3
[58] irlba_2.3.5.1 colorspace_2.1-1 hdf5r_1.3.11
[61] ROCR_1.0-11 reticulate_1.39.0 spatstat.data_3.1-2
[64] magrittr_2.0.3 lmtest_0.9-40 readr_2.1.5
[67] viridis_0.6.5 later_1.3.2 lattice_0.22-6
[70] spatstat.geom_3.3-3 future.apply_1.11.2 scattermore_1.2
[73] XML_3.99-0.17 cowplot_1.1.3 RcppAnnoy_0.0.22
[76] pillar_1.9.0 nlme_3.1-164 iterators_1.0.14
[79] pwalign_1.0.0 beachmat_2.20.0 caTools_1.18.3
[82] compiler_4.4.0 RSpectra_0.16-2 stringi_1.8.4
[85] tensor_1.5 minqa_1.2.8 GenomicAlignments_1.40.0
[88] plyr_1.8.9 crayon_1.5.3 abind_1.4-8
[91] BiocIO_1.14.0 blme_1.0-6 googledrive_2.1.1
[94] locfit_1.5-9.10 graphlayouts_1.2.0 bit_4.5.0
[97] fastmatch_1.1-4 codetools_0.2-20 BiocSingular_1.20.0
[100] SeuratData_0.2.2.9001 GetoptLong_1.0.5 plotly_4.10.4
[103] remaCor_0.0.18 mime_0.12 splines_4.4.0
[106] circlize_0.4.16 Rcpp_1.0.13 fastDummies_1.7.4
[109] sparseMatrixStats_1.16.0 cellranger_1.1.0 blob_1.2.4
[112] utf8_1.2.4 clue_0.3-65 seqLogo_1.70.0
[115] AnnotationFilter_1.28.0 lme4_1.1-35.5 fs_1.6.4
[118] listenv_0.9.1 DelayedMatrixStats_1.26.0 Rdpack_2.6.1
[121] tibble_3.2.1 Matrix_1.7-0 statmod_1.5.0
[124] tzdb_0.4.0 tweenr_2.0.3 fANCOVA_0.6-1
[127] pkgconfig_2.0.3 tools_4.4.0 cachem_1.1.0
[130] RhpcBLASctl_0.23-42 rbibutils_2.3 RSQLite_2.3.7
[133] viridisLite_0.4.2 DBI_1.2.3 numDeriv_2016.8-1.1
[136] fastmap_1.2.0 scales_1.3.0 grid_4.4.0
[139] ica_1.0-3 shinydashboard_0.7.2 Rsamtools_2.20.0
[142] broom_1.0.7 patchwork_1.3.0 BiocManager_1.30.25
[145] dotCall64_1.2 RANN_2.6.2 farver_2.1.2
[148] reformulas_0.3.0 tidygraph_1.3.1 aod_1.3.3
[151] mgcv_1.9-1 yaml_2.3.10 rtracklayer_1.64.0
[154] cli_3.6.3 leiden_0.4.3.1 lifecycle_1.0.4
[157] uwot_0.2.2 glmmTMB_1.1.10 mvtnorm_1.3-1
[160] bluster_1.14.0 backports_1.5.0 presto_1.0.0
[163] BSgenome.Hsapiens.UCSC.hg38_1.4.5 annotate_1.82.0 gtable_0.3.5
[166] rjson_0.2.23 ggridges_0.5.6 progressr_0.14.0
[169] jsonlite_1.8.8 RcppHNSW_0.6.0 TFBSTools_1.42.0
[172] bitops_1.0-9 bit64_4.5.2 Rtsne_0.17
[175] BiocNeighbors_1.22.0 spatstat.utils_3.1-0 CNEr_1.40.0
[178] metapod_1.12.0 dqrng_0.4.1 shinyjs_2.1.0
[181] SeuratDisk_0.0.0.9021 spatstat.univar_3.0-1 R.utils_2.12.3
[184] pbkrtest_0.5.3 lazyeval_0.2.2 shiny_1.9.1
[187] htmltools_0.5.8.1 GO.db_3.19.1 rappdirs_0.3.3
[190] ensembldb_2.28.1 glue_1.7.0 TFMPvalue_0.0.9
[193] spam_2.11-0 googlesheets4_1.1.1 XVector_0.44.0
[196] RCurl_1.98-1.16 BSgenome_1.72.0 gridExtra_2.3
[199] EnvStats_3.0.0 boot_1.3-30 variancePartition_1.34.0
[202] JASPAR2020_0.99.10 igraph_2.0.3 TMB_1.9.15
[205] R6_2.5.1 gplots_3.2.0 tidyr_1.3.1
[208] RcppRoll_0.3.1 labeling_0.4.3 GenomicFeatures_1.56.0
[211] cluster_2.1.6 Rhdf5lib_1.26.0 gargle_1.5.2
[214] nloptr_2.1.1 DirichletMultinomial_1.46.0 vipor_0.4.7
[217] DelayedArray_0.30.1 tidyselect_1.2.1 ProtGenerics_1.36.0
[220] ggforce_0.4.2 AnnotationDbi_1.66.0 future_1.34.0
[223] rsvd_1.0.5 munsell_0.5.1 KernSmooth_2.23-22
[226] data.table_1.16.2 htmlwidgets_1.6.4 ComplexHeatmap_2.20.0
[229] rlang_1.1.4 spatstat.sparse_3.1-0 spatstat.explore_3.3-2
[232] lmerTest_3.1-3 remotes_2.5.0 fansi_1.0.6
[235] beeswarm_0.4.0

MikeDMorgan commented 2 weeks ago

Firstly, a k=4 is extremely small for building a kNN-graph given you have >700k cells. You are then using a different value for k in the nhood definition, so you most likely don't have many connections between your cells. Inspect your nhoodSize histogram and aim for ~5xS, where S is the number of experimental samples, OR a median nhood size of ~100 if you have lots of samples, i.e. >20.

Secondly, subsetting the Milo object on the rows, subsets on the genes, not the cells.

sepehrgolriz commented 2 weeks ago

First of all, thanks a lot for your kind help.

Despite the changes I did in the code, I am still getting the same error:

Here is the changes:

traj_milo <- buildGraph(traj_milo, k = 40, d = 40) # Increased k Constructing kNN graph with k:40 traj_milo <- makeNhoods(traj_milo, prop = 0.12, k = 40, d = 40, refined = TRUE, refinement_scheme = "graph") Checking valid object Running refined sampling with graph traj_milo <- countCells(traj_milo, meta.data = data.frame(colData(traj_milo)), samples = "orig.ident") Checking meta.data validity Counting cells in neighbourhoods plotNhoodSizeHist(traj_milo) image

Check the median neighborhood size

median_nhood_size <- median(colSums(nhoods(traj_milo))) print(paste("Median neighborhood size:", median_nhood_size)) [1] "Median neighborhood size: 106"

traj_milo <- buildNhoodGraph(traj_milo) Error in if (any(i < 0L)) { : missing value where TRUE/FALSE needed In addition: Warning message: In int2i(as.integer(i), n) : NAs introduced by coercion to integer range

Any chance to have more of your kind help?

Thanks a in advance.

MikeDMorgan commented 2 weeks ago

Can you run the following code without error:

nh_intersect_mat <- miloR:::.build_nhood_adjacency(nhoods(traj_milo))
sepehrgolriz commented 2 weeks ago

Unfortunately no! Got the same error as before:

nh_intersect_mat <- miloR:::.build_nhood_adjacency(nhoods(traj_milo))

Error in if (any(i < 0L)) { : missing value where TRUE/FALSE needed In addition: Warning message: In int2i(as.integer(i), n) : NAs introduced by coercion to integer range

sepehrgolriz commented 1 week ago

Dear Mike, I was wondering if I can have more of your kind help?!

Thanks a lot in advance.

MikeDMorgan commented 1 day ago

How many nhoods do you have in this object, and what is the provenance of the data?