MarioniLab / miloR

R package implementation of Milo for testing for differential abundance in KNN graphs
https://bioconductor.org/packages/release/bioc/html/miloR.html
GNU General Public License v3.0
333 stars 22 forks source link

merging Nhood with logFC of opposite sign #270

Open gianfilippo opened 1 year ago

gianfilippo commented 1 year ago

Hi,

I am running the groupNhoods function on the da_results data.frame. The da_results shows the following logFC logCPM F PValue FDR Nhood SpatialFDR 946 6.555608 7.725746 20.33434 3.864789e-04 0.2744 946 0.2281628 1221 -2.679248 8.772024 26.29051 9.286647e-05 0.2744 1221 0.2281628 2178 -3.034338 8.446574 20.82294 3.465521e-04 0.2744 2178 0.2281628 2248 -2.549574 8.700925 20.51058 3.239712e-04 0.2744 2248 0.2281628 2291 -3.060969 8.536979 22.01882 2.303425e-04 0.2744 2291 0.2281628

da_results = groupNhoods(miloobj, da_results, da.fdr=FDR) logFC logCPM F PValue FDR Nhood SpatialFDR. NhoodGroup 946 6.555608 7.725746 20.33434 3.864789e-04 0.2744 946 0.2281628. 9 1221 -2.679248 8.772024 26.29051 9.286647e-05 0.2744 1221 0.2281628. 14 2178 -3.034338 8.446574 20.82294 3.465521e-04 0.2744 2178 0.2281628. 9 2248 -2.549574 8.700925 20.51058 3.239712e-04 0.2744 2248 0.2281628. 1 2291 -3.060969 8.536979 22.01882 2.303425e-04 0.2744 2291 0.2281628. 1

It looks like the Nhoods 946 and 2178 are placed in teh same group, even though the logFC is of opposite sign.

Note: I am using a FDR cut-off of 0.25, because in this run I only had N=2

I am running miloR version 1.5.0

Am I missing something ?

Thanks

MikeDMorgan commented 1 year ago

Hi @gianfilippo as stated on the github repo landing page please report the output of your sessionInfo() when reporting issues.

gianfilippo commented 1 year ago

sorry about it. Here it is R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)

Matrix products: default BLAS/LAPACK: /gpfs/ycga/apps/hpc/software/OpenBLAS/0.3.12-GCC-10.2.0/lib/libopenblas_haswellp-r0.3.12.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 grid stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] miloR_1.5.0 edgeR_3.40.1
[3] limma_3.54.0 HGNChelper_0.8.1
[5] ProjecTILs_3.0.1 DoubletFinder_2.0.3
[7] scDblFinder_1.13.7 djvdj_0.0.0.9000
[9] scater_1.24.0 scuttle_1.6.3
[11] SingleCellExperiment_1.20.0 SummarizedExperiment_1.26.1 [13] Biobase_2.56.0 GenomicRanges_1.48.0
[15] GenomeInfoDb_1.32.4 IRanges_2.30.1
[17] S4Vectors_0.34.0 BiocGenerics_0.42.0
[19] MatrixGenerics_1.10.0 matrixStats_0.62.0
[21] muscat_1.10.1 CIPR_0.1.0
[23] ComplexHeatmap_2.12.1 gprofiler2_0.2.1
[25] ggplot2_3.4.0 cowplot_1.1.1
[27] future_1.28.0 purrr_1.0.1
[29] openxlsx_4.2.5 patchwork_1.1.1
[31] miQC_1.4.0 SeuratWrappers_0.3.1
[33] SeuratDisk_0.0.0.9020 glmGamPoi_1.8.0
[35] SeuratObject_4.1.3 Seurat_4.3.0
[37] dplyr_1.1.0

loaded via a namespace (and not attached): [1] rsvd_1.0.5 ica_1.0-2
[3] Rsamtools_2.12.0 foreach_1.5.2
[5] lmtest_0.9-40 crayon_1.5.1
[7] rbibutils_2.2.8 MASS_7.3-57
[9] nlme_3.1-157 backports_1.4.1
[11] rlang_1.0.6 XVector_0.36.0
[13] ROCR_1.0-11 irlba_2.3.5.1
[15] nloptr_2.0.0 xgboost_1.6.0.1
[17] BiocParallel_1.32.4 rjson_0.2.21
[19] bit64_4.0.5 glue_1.6.2
[21] sctransform_0.3.5 pbkrtest_0.5.1
[23] parallel_4.2.0 vipor_0.4.5
[25] spatstat.sparse_3.0-0 AnnotationDbi_1.60.0
[27] spatstat.geom_3.0-3 tidyselect_1.2.0
[29] fitdistrplus_1.1-8 variancePartition_1.26.0 [31] XML_3.99-0.9 tidyr_1.2.0
[33] zoo_1.8-10 GenomicAlignments_1.32.1 [35] xtable_1.8-4 magrittr_2.0.3
[37] Rdpack_2.3 cli_3.6.0
[39] zlibbioc_1.42.0 dbscan_1.1-11
[41] miniUI_0.1.1.1 sp_1.5-0
[43] aod_1.3.2 shiny_1.7.1
[45] BiocSingular_1.12.0 clue_0.3-60
[47] cluster_2.1.3 caTools_1.18.2
[49] tidygraph_1.2.1 KEGGREST_1.36.3
[51] tibble_3.1.8 ggrepel_0.9.1
[53] listenv_0.8.0 Biostrings_2.64.1
[55] png_0.1-7 withr_2.5.0
[57] bitops_1.0-7 ggforce_0.4.1
[59] plyr_1.8.7 pracma_2.4.2
[61] dqrng_0.3.0 coda_0.19-4
[63] pillar_1.8.1 gplots_3.1.3
[65] GlobalOptions_0.1.2 cachem_1.0.6
[67] multcomp_1.4-18 flexmix_2.3-17
[69] hdf5r_1.3.5 GetoptLong_1.0.5
[71] DelayedMatrixStats_1.18.0 vctrs_0.5.2
[73] ellipsis_0.3.2 generics_0.1.2
[75] tools_4.2.0 beeswarm_0.4.0
[77] munsell_0.5.0 tweenr_2.0.2
[79] emmeans_1.8.1-1 DelayedArray_0.22.0
[81] fastmap_1.1.0 compiler_4.2.0
[83] abind_1.4-5 httpuv_1.6.5
[85] rtracklayer_1.56.1 plotly_4.10.0
[87] GenomeInfoDbData_1.2.8 gridExtra_2.3
[89] glmmTMB_1.1.5 lattice_0.20-45
[91] deldir_1.0-6 utf8_1.2.2
[93] later_1.3.0 jsonlite_1.8.0
[95] scales_1.2.0 ScaledMatrix_1.4.0
[97] pbapply_1.5-0 sparseMatrixStats_1.8.0
[99] estimability_1.4.1 genefilter_1.78.0
[101] lazyeval_0.2.2 promises_1.2.0.1
[103] doParallel_1.0.17 R.utils_2.11.0
[105] goftest_1.2-3 spatstat.utils_3.0-1
[107] reticulate_1.26 sandwich_3.0-1
[109] blme_1.0-5 statmod_1.4.36
[111] Rtsne_0.16 uwot_0.1.14
[113] igraph_1.3.1 survival_3.3-1
[115] numDeriv_2016.8-1.1 yaml_2.3.5
[117] htmltools_0.5.2 memoise_2.0.1
[119] modeltools_0.2-23 BiocIO_1.6.0
[121] locfit_1.5-9.5 graphlayouts_0.8.0
[123] viridisLite_0.4.0 digest_0.6.29
[125] RhpcBLASctl_0.21-247.1 mime_0.12
[127] RSQLite_2.2.12 future.apply_1.9.0
[129] remotes_2.4.2 data.table_1.14.2
[131] blob_1.2.3 R.oo_1.24.0
[133] splines_4.2.0 RCurl_1.98-1.6
[135] broom_1.0.1 hms_1.1.1
[137] colorspace_2.0-3 BiocManager_1.30.19
[139] ggbeeswarm_0.6.0 shape_1.4.6
[141] nnet_7.3-17 Rcpp_1.0.8.3
[143] RANN_2.6.1 mvtnorm_1.1-3
[145] circlize_0.4.14 fansi_1.0.3
[147] tzdb_0.3.0 parallelly_1.32.1
[149] R6_2.5.1 ggridges_0.5.3
[151] lifecycle_1.0.3 zip_2.2.0
[153] bluster_1.6.0 abdiv_0.2.0
[155] minqa_1.2.4 leiden_0.4.3
[157] Matrix_1.5-0 RcppAnnoy_0.0.19
[159] TH.data_1.1-0 RColorBrewer_1.1-3
[161] iterators_1.0.14 spatstat.explore_3.0-5
[163] TMB_1.9.1 stringr_1.4.0
[165] htmlwidgets_1.5.4 beachmat_2.12.0
[167] polyclip_1.10-4 iNEXT_3.0.0
[169] globals_0.16.1 spatstat.random_3.0-1
[171] progressr_0.11.0 codetools_0.2-18
[173] metapod_1.4.0 gtools_3.9.2
[175] prettyunits_1.1.1 R.methodsS3_1.8.1
[177] gtable_0.3.0 DBI_1.1.2
[179] tensor_1.5 httr_1.4.2
[181] KernSmooth_2.23-20 stringi_1.7.6
[183] progress_1.2.2 farver_2.1.0
[185] reshape2_1.4.4 annotate_1.74.0
[187] viridis_0.6.2 boot_1.3-28
[189] BiocNeighbors_1.14.0 lme4_1.1-29
[191] restfulr_0.0.15 readr_2.1.2
[193] geneplotter_1.74.0 scattermore_0.8
[195] DESeq2_1.36.0 scran_1.24.1
[197] bit_4.0.4 spatstat.data_3.0-0
[199] ggraph_2.1.0 pkgconfig_2.0.3
[201] lmerTest_3.1-3

MikeDMorgan commented 1 year ago

Please up date Milo the current Bioc release first.

gianfilippo commented 1 year ago

Hi, I updated miloR to 1.6.0. I still get Nhoods of opposite sign within the same NhoodGroup, if I run the following

da_results = groupNhoods(miloOBJ, da_results, da.fdr=0.05)

logFC PValue FDR Nhood SpatialFDR NhoodGroup -3.672 3.096e-06 0.002750 26 0.0015060 7 -5.267 8.613e-07 0.001912 1379 0.0007996 7 2.937 5.126e-05 0.022768 1717 0.0155335 7 7.346 7.226e-07 0.001912 1933 0.0007996 7 -7.452 1.556e-06 0.001927 3038 0.0009698 7 -4.407 4.650e-06 0.003442 3039 0.0019969 7 4.313 5.801e-06 0.003680 3806 0.0023399 3 -3.418 7.929e-05 0.032013 3873 0.0209600 7 7.148 3.307e-05 0.016318 3992 0.0106319 3 3.089 1.418e-04 0.052510 4077 0.0354637 7 -4.147 1.195e-05 0.006634 4139 0.0042020 13 -4.472 1.735e-06 0.001927 4408 0.0009698 7

What do you think ?

Thanks

MikeDMorgan commented 1 year ago

Have you previously run buildNhoodGraph on these data? If so then it will by default use this unless you also specify compute.new=TRUE. I have no idea of course because you haven't posted your code.

gianfilippo commented 1 year ago

Hi,

I start from a Seurat integrated object, scdata. DefaultAssay(scdata) <- "RNA" sce <- as.SingleCellExperiment(DietSeurat(scdata, graphs = NNgname, assays = "RNA", dimreducs = c("pca","integrated.umap"))) miloOBJ <- Milo(sce) reducedDim(miloOBJ,type="pca") = scdata[["pca"]]@cell.embeddings reducedDim(miloOBJ,type="umap") = scdata[["integrated_umap"]]@cell.embeddings miloOBJ@graph = scdata@graphs graph(miloOBJ) = graph(buildFromAdjacency(miloOBJ@graph[[NNgname]],is.binary=T,k=20)) miloOBJ <- makeNhoods(miloOBJ, prop=0.1, k=20, d=30, refined=T, reduced_dims="pca")

Then countCells calcNhoodDistance with reduced.dim="pca" miloOBJ <- buildNhoodGraph(miloOBJ)

MikeDMorgan commented 1 year ago

Ok - either don't run buildNhoodGraph, because this is agnostic to any DA testing results, or run groupNhoods(..., compute.new=TRUE).

gianfilippo commented 1 year ago

I will give a try to both options

Thanks

gianfilippo commented 1 year ago

Hi,

I tried both suggestions. Here are the results:

1) the groupNhoods(..., compute.new=TRUE) does not seem to change the results. Still getting nhoodgroups with nhoods of opposite sign.

2) I restarted from scratch, i.e. from the Seurat object. Not running buildNhoodGraph results in a different number of DA nhoods, I get 17 instead of 12. Not sure what this means. Also, still getting nhoodgroups with nhoods of opposite sign, after running da_results = groupNhoods(miloOBJ, da_results, da.fdr=FDR) I can attach the results if you like.

What do you think ? What am I doing wrong ?

gianfilippo commented 1 year ago

Hi,

did you have time to look into this issue ?

Thanks

gianfilippo commented 1 year ago

I seems that if I define subset.nhoods (= to the DA Nhoods in my case), the logFC are concordant within each group, although the logFC difference can be larger than the max.lfc.delta.

pranithavangala commented 1 year ago

Any update on this @MikeDMorgan

gianfilippo commented 10 months ago

my present solution is to use a combination of cell cluster and logFC sign to define groups. I may also consider adding grouping by logFC value (range), but have not done that yet.

I am not sure about "mixed" nhoods, should I include them in marker analysis or filter them out ?

liezeltamon commented 6 months ago

hoping for an update on this, thank you!

DarioS commented 5 months ago

Isn't it expected and not an issue? The same kind of grouping is visible in the publication e.g. Figure 5d.

image