MarioniLab / miloDE

Framework for sensitive DE testing (using neighbourhoods)
Other
57 stars 2 forks source link

Problem with get-wgcna_modules #33

Closed HeleneHemmer closed 7 months ago

HeleneHemmer commented 9 months ago

Dear miloDE team,

thanks a lot for developing the tool! Currently, I play around with your sample data from mouse gastrulation but also my own data. Following your vignette, I run into an error, when trying to use the get_wgcna_modules function. This is the error message:

modules_wgcna = suppressMessages(get_wgcna_modules(convert_de_stat(de_stat) , n_hoods_sig.thresh = 4)) Warning: Data is of class data.frame. Coercing to dgCMatrix. Calculating gene variances 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **| Calculating feature variances of standardized and clipped values 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **| Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'as.matrix': no slot of name "data" for this object of class "Assay5" In addition: Warning messages: 1: In eval(predvars, data, env) : NaNs produced 2: In hvf.info$variance.expected[not.const] <- 10^fit$fitted : number of items to replace is not a multiple of replacement length Called from: h(simpleError(msg, call)) Browse[1]>

The browser window that is opening in RStudio shows the following error messages:

Bildschirmfoto 2023-12-15 um 14 23 03

I was trying to troubleshoot the function line by line, but I immediately run into problems when focussing on genes in 2 neighbourhoods:

de_stat_per_gene = as.data.frame(de_stat %>% group_by(gene) %>% dplyr::summarise(n_hoods_sig = sum(pval_corrected_across_nhoods < pval.thresh , na.rm = TRUE))) Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': no applicable method for 'group_by' applied to an object of class "c('SingleCellExperiment', 'RangedSummarizedExperiment', 'SummarizedExperiment', 'RectangularData', 'Vector', 'Annotated', 'vector_OR_Vector')"

I adapted the code like this to make it work:

de_stat_per_gene = as.data.frame(convert_de_stat(de_stat) %>% group_by(gene) %>% dplyr::summarise(n_hoods_sig = sum(pval_corrected_across_nhoods < 0.1 , na.rm = TRUE)))

But when executing the following command I receive an error message:

de_stat = de_stat[de_stat$gene %in% genes_sig, ] de_stat = de_stat[order(de_stat$Nhood) , ] Error: subscript contains out-of-bounds indices

That is why I did not run the selection for the signature genes:

de_stat = de_stat[de_stat$gene %in% genes_sig, ]

In addition, in the following lines of code, I was not able to access the de_stat object in the way you describe it in the code, e.g.:

de_stat$logFC[is.na(de_stat$logFC)] = 0 Error in [[<-(*tmp*, name, value = numeric(0)) : 0 elements in value to replace 42 elements

Here I changed the code to:

de_stat@assays@data$logFC[is.na(de_stat@assays@data$logFC)] = 0

The next issue I faced was the following:

de_stat = reshape2::dcast(data = de_stat, formula = gene~Nhood, value.var = "logFC") Error: value.var (logFC) not found in input

Here I changed the code to:

if ("logFC" %in% names(assays(de_stat))) { logFC_values <- assays(de_stat)$logFC logFC_data <- data.frame(gene = rownames(de_stat), logFC_values) melted_data <- reshape2::melt(logFC_data, id.vars = "gene", variable.name = "Nhood", value.name = "logFC") casted_data <- reshape2::dcast(data = melted_data, formula = gene ~ Nhood, value.var = "logFC") } else { warning("The 'logFC' column is missing in 'de_stat'. Please check the structure of your data.") }

Finally, when creating the SeuratObject I receive the following error:

obj.seurat <- CreateSeuratObject(counts = de_stat) Warning: Data is of class SingleCellExperiment. Coercing to dgCMatrix. Error in UseMethod(generic = "as.sparse", object = x) : no applicable method for 'as.sparse' applied to an object of class "c('SingleCellExperiment', 'RangedSummarizedExperiment', 'SummarizedExperiment', 'RectangularData', 'Vector', 'Annotated', 'vector_OR_Vector')"

I would very much appreciate your help. Currently, I use Seurat version 5.0.1, scWGCNA v1.0.0 and R version is 4.3.2. If there is any information missing, that I would need to provide, please don't hesitate to ask!

Thanks a lot!

amissarova commented 8 months ago

@HeleneHemmer , its a bit hard to follow through all the debugging steps - would it be possible to share a test object so I could try to debug it on my end?

amissarova commented 8 months ago

@HeleneHemmer , in parallel, can you check next things (looking through the error messages, think that might be an issue):

  1. ensure that at least few genes have number of significant neighbourhoods as the threshold you choose (4 in your case). If no - thats probably what the problem is; If yes - does the code still run if you decrease threshold? Overall - i would play with different thresholds here and find out whether the bug persists or no. So being said, practically sometimes the scWGCNA throws an error for certain data, Im not entirely sure which properties force it to break - but i saw that altering the selection of genes can redeem a situation (i would try both increasing and decreasing a threshold).

  2. i would also ensure that the type of the main variable is data.frame (for this purpose, I run convert_de_stat - coz the original de_stat is not a data frame) - can you check that this is indeed the case?

Lmk how this goes? If nothing looks suspicious here, pls send the test input and we can look at this together

amissarova commented 8 months ago

and also, can you tell me what are the colnames you have of your data frame

HeleneHemmer commented 8 months ago

@amissarova Thanks for your response. I tried changing the parameters in "n_hoods_sig.thresh" in the function get_wgcna_modules to 1, 2, 4, 6 and 10 but it did not solve the error for me.

The de_stat object looks like this:

class(de_stat) [1] "data.frame" colnames(de_stat) [1] "Nhood" "gene" "logFC" "pval"
[5] "pval_corrected_across_genes" "pval_corrected_across_nhoods" "Nhood_center" "test_performed"

I also invited you to a private github repositiory where you can find the code and the de_stat file. I hope you can access it, if not, please let me know!

This my session info if that helps in any way, too:

sessionInfo() R version 4.3.2 (2023-10-31) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.2.1

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] MouseGastrulationData_1.16.0 SpatialExperiment_1.12.0 ggpubr_0.6.0
[4] viridis_0.6.4 viridisLite_0.4.2 scWGCNA_1.0.0
[7] reshape2_1.4.4 scran_1.30.0 uwot_0.1.16
[10] scuttle_1.12.0 SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0 [13] Biobase_2.62.0 GenomicRanges_1.54.1 GenomeInfoDb_1.38.2
[16] IRanges_2.36.0 S4Vectors_0.40.2 BiocGenerics_0.48.1
[19] MatrixGenerics_1.14.0 matrixStats_1.2.0 miloR_1.10.0
[22] edgeR_4.0.3 limma_3.58.1 miloDE_0.0.0.9000
[25] knitr_1.45 tidyr_1.3.0 stringi_1.8.3
[28] stringr_1.5.1 openxlsx_4.2.5.2 Matrix_1.6-4
[31] RColorBrewer_1.1-3 ggrepel_0.9.4 scales_1.3.0
[34] ggplot2_3.4.4 dplyr_1.1.4 glmGamPoi_1.14.0
[37] sctransform_0.4.1 SeuratDisk_0.0.0.9021 Seurat_5.0.1
[40] SeuratObject_5.0.1 sp_2.1-2

loaded via a namespace (and not attached): [1] dichromat_2.0-0.1 nnet_7.3-19 goftest_1.2-3
[4] Biostrings_2.70.1 vctrs_0.6.5 spatstat.random_3.2-2
[7] digest_0.6.33 png_0.1-8 Augur_1.0.3
[10] deldir_2.0-2 parallelly_1.36.0 magick_2.8.2
[13] MASS_7.3-60 httpuv_1.6.13 foreach_1.5.2
[16] withr_2.5.2 xfun_0.41 ellipsis_0.3.2
[19] survival_3.5-7 memoise_2.0.1 ggbeeswarm_0.7.2
[22] parsnip_1.1.1 zoo_1.8-12 gtools_3.9.5
[25] pbapply_1.7-2 Formula_1.2-5 KEGGREST_1.42.0
[28] promises_1.2.1 httr_1.4.7 rstatix_0.7.2
[31] globals_0.16.2 fitdistrplus_1.1-11 rstudioapi_0.15.0
[34] miniUI_0.1.1.1 generics_0.1.3 base64enc_0.1-3
[37] curl_5.2.0 zlibbioc_1.48.0 ScaledMatrix_1.10.0
[40] ggraph_2.1.0 polyclip_1.10-6 randomForest_4.7-1.1
[43] GenomeInfoDbData_1.2.11 ExperimentHub_2.10.0 SparseArray_1.2.2
[46] interactiveDisplayBase_1.40.0 xtable_1.8-4 doParallel_1.0.17
[49] evaluate_0.23 S4Arrays_1.2.0 preprocessCore_1.64.0
[52] BiocFileCache_2.10.1 irlba_2.3.5.1 colorspace_2.1-0
[55] filelock_1.0.3 hdf5r_1.3.8 ROCR_1.0-11
[58] reticulate_1.34.0 spatstat.data_3.0-3 magrittr_2.0.3
[61] lmtest_0.9-40 later_1.3.2 lattice_0.22-5
[64] mapproj_1.2.11 spatstat.geom_3.2-7 future.apply_1.11.1
[67] scattermore_1.2 cowplot_1.1.2 RcppAnnoy_0.0.21
[70] class_7.3-22 Hmisc_5.1-1 pillar_1.9.0
[73] nlme_3.1-164 iterators_1.0.14 compiler_4.3.2
[76] beachmat_2.18.0 RSpectra_0.16-1 gower_1.0.1
[79] tensor_1.5 lubridate_1.9.3 plyr_1.8.9
[82] crayon_1.5.2 abind_1.4-5 locfit_1.5-9.8
[85] pals_1.8 graphlayouts_1.0.2 bit_4.0.5
[88] RcppGreedySetCover_0.1.0 fastcluster_1.2.3 codetools_0.2-19
[91] recipes_1.0.9 BiocSingular_1.18.0 plotly_4.10.3
[94] mime_0.12 rsample_1.2.0 splines_4.3.2
[97] Rcpp_1.0.11 fastDummies_1.7.3 dbplyr_2.4.0
[100] sparseMatrixStats_1.14.0 BumpyMatrix_1.10.0 blob_1.2.4
[103] utf8_1.2.4 BiocVersion_3.18.1 checkmate_2.3.1
[106] listenv_0.9.0 DelayedMatrixStats_1.24.0 ggsignif_0.6.4
[109] tibble_3.2.1 statmod_1.5.0 tweenr_2.0.2
[112] pkgconfig_2.0.3 tools_4.3.2 cachem_1.0.8
[115] RSQLite_2.3.4 DBI_1.2.0 impute_1.76.0
[118] rmarkdown_2.25 fastmap_1.1.1 grid_4.3.2
[121] pbmcapply_1.5.1 ica_1.0-3 broom_1.0.5
[124] AnnotationHub_3.10.0 patchwork_1.1.3 BiocManager_1.30.22
[127] dotCall64_1.1-1 carData_3.0-5 RANN_2.6.1
[130] rpart_4.1.23 farver_2.1.1 tidygraph_1.3.0
[133] yaml_2.3.8 foreign_0.8-86 cli_3.6.2
[136] purrr_1.0.2 tester_0.1.7 leiden_0.4.3.1
[139] lifecycle_1.0.4 bluster_1.12.0 lava_1.7.3
[142] backports_1.4.1 BiocParallel_1.36.0 timechange_0.2.0
[145] gtable_0.3.4 rjson_0.2.21 ggridges_0.5.5
[148] yardstick_1.2.0 progressr_0.14.0 parallel_4.3.2
[151] jsonlite_1.8.8 RcppHNSW_0.5.0 bitops_1.0-7
[154] bit64_4.0.5 Rtsne_0.17 spatstat.utils_3.0-4
[157] BiocNeighbors_1.20.1 zip_2.3.0 highr_0.10
[160] metapod_1.10.0 dqrng_0.3.2 timeDate_4032.109
[163] lazyeval_0.2.2 shiny_1.8.0 dynamicTreeCut_1.63-1
[166] htmltools_0.5.7 GO.db_3.18.0 rappdirs_0.3.3
[169] glue_1.6.2 spam_2.10-0 XVector_0.42.0
[172] RCurl_1.98-1.13 gridExtra_2.3 igraph_1.6.0
[175] R6_2.5.1 labeling_0.4.3 cluster_2.1.6
[178] ipred_0.9-14 DelayedArray_0.28.0 tidyselect_1.2.0
[181] vipor_0.4.7 WGCNA_1.72-5 htmlTable_2.4.2
[184] maps_3.4.2 ggforce_0.4.1 car_3.1-2
[187] AnnotationDbi_1.64.1 future_1.33.1 rsvd_1.0.5
[190] munsell_0.5.0 KernSmooth_2.23-22 furrr_0.3.1
[193] data.table_1.14.10 htmlwidgets_1.6.4 rlang_1.1.2
[196] spatstat.sparse_3.0-3 spatstat.explore_3.2-5 fansi_1.0.6
[199] hardhat_1.3.0 beeswarm_0.4.0 prodlim_2023.08.28

amissarova commented 7 months ago

@HeleneHemmer I think i might know what is going on - as it turns out, in the latest Seurat, that handling of counts and normalised counts assay is done differently, and the originally adapted code from scWGCNA doesnt work with it. Im currently trying to think, whether it is possible to update it on my end so it is compatible, however, im not sure it will be since scWGCNA itself it seems relies on earlier version.

As a temporarily solution, can you try to downgrade Seurat to V4.4.0 for example and see if that helps?

HeleneHemmer commented 7 months ago

@amissarova thank you for the info, that was my concern as well. I'll try it with v.4.4.0 again and let you know how it goes

amissarova commented 7 months ago

@HeleneHemmer if you decide to try it, pls let me know how it goes - then i ll need to decide on how to proceed with this package incompatibility issue

HeleneHemmer commented 7 months ago

@amissarova yes, I will for sure!

HeleneHemmer commented 7 months ago

@amissarova Hi! I ran the code again with Seurat version 4, but encountered this new error:

modules_wgcna = suppressMessages(get_wgcna_modules(convert_de_stat(de_stat) , n_hoods_sig.thresh = 4)) Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': no applicable method for 'group_by' applied to an object of class "c('SingleCellExperiment', 'RangedSummarizedExperiment', 'SummarizedExperiment', 'RectangularData', 'Vector', 'Annotated', 'vector_OR_Vector')" Called from: h(simpleError(msg, call)) Browse[1]> c

Therefore, I removed the "convert_de_stat(de_stat)" in get_wgcna_modules and now it seems to work...

This is my session info:

sessionInfo() R version 4.3.2 (2023-10-31) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.2.1

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] ggpubr_0.6.0 ggplot2_3.4.4 viridis_0.6.4 viridisLite_0.4.2
[5] scWGCNA_1.0.0 reshape2_1.4.4 dplyr_1.1.4 scran_1.30.1
[9] uwot_0.1.16 Matrix_1.6-5 scuttle_1.12.0 SingleCellExperiment_1.24.0 [13] SummarizedExperiment_1.32.0 Biobase_2.62.0 GenomicRanges_1.54.1 GenomeInfoDb_1.38.5
[17] IRanges_2.36.0 S4Vectors_0.40.2 BiocGenerics_0.48.1 MatrixGenerics_1.14.0
[21] matrixStats_1.2.0 miloR_1.10.0 edgeR_4.0.12 limma_3.58.1
[25] miloDE_0.0.0.9000 knitr_1.45 SeuratObject_4.1.4 Seurat_4.4.0

loaded via a namespace (and not attached): [1] spatstat.sparse_3.0-3 bitops_1.0-7 lubridate_1.9.3 doParallel_1.0.17 httr_1.4.7
[6] RColorBrewer_1.1-3 dynamicTreeCut_1.63-1 tools_4.3.2 sctransform_0.4.1 backports_1.4.1
[11] utf8_1.2.4 R6_2.5.1 lazyeval_0.2.2 yardstick_1.3.0 withr_3.0.0
[16] sp_2.1-2 gridExtra_2.3 preprocessCore_1.64.0 progressr_0.14.0 WGCNA_1.72-5
[21] cli_3.6.2 spatstat.explore_3.2-5 labeling_0.4.3 spatstat.data_3.0-4 randomForest_4.7-1.1
[26] RcppGreedySetCover_0.1.0 ggridges_0.5.6 pbapply_1.7-2 foreign_0.8-86 dichromat_2.0-0.1
[31] parallelly_1.36.0 maps_3.4.2 impute_1.76.0 RSQLite_2.3.5 rstudioapi_0.15.0
[36] pals_1.8 generics_0.1.3 gtools_3.9.5 ica_1.0-3 spatstat.random_3.2-2
[41] car_3.1-2 GO.db_3.18.0 ggbeeswarm_0.7.2 fansi_1.0.6 abind_1.4-5
[46] lifecycle_1.0.4 carData_3.0-5 glmGamPoi_1.14.0 recipes_1.0.9 SparseArray_1.2.3
[51] Rtsne_0.17 blob_1.2.4 grid_4.3.2 promises_1.2.1 dqrng_0.3.2
[56] crayon_1.5.2 miniUI_0.1.1.1 lattice_0.22-5 beachmat_2.18.0 cowplot_1.1.3
[61] KEGGREST_1.42.0 mapproj_1.2.11 pillar_1.9.0 metapod_1.10.1 future.apply_1.11.1
[66] codetools_0.2-19 Augur_1.0.3 leiden_0.4.3.1 glue_1.7.0 rsample_1.2.0
[71] data.table_1.14.10 vctrs_0.6.5 png_0.1-8 gtable_0.3.4 cachem_1.0.8
[76] gower_1.0.1 xfun_0.41 S4Arrays_1.2.0 mime_0.12 prodlim_2023.08.28
[81] tidygraph_1.3.0 survival_3.5-7 timeDate_4032.109 iterators_1.0.14 pbmcapply_1.5.1
[86] hardhat_1.3.0 lava_1.7.3 statmod_1.5.0 bluster_1.12.0 ellipsis_0.3.2
[91] fitdistrplus_1.1-11 ROCR_1.0-11 ipred_0.9-14 nlme_3.1-164 bit64_4.0.5
[96] RcppAnnoy_0.0.22 irlba_2.3.5.1 vipor_0.4.7 KernSmooth_2.23-22 rpart_4.1.23
[101] Hmisc_5.1-1 DBI_1.2.1 colorspace_2.1-0 nnet_7.3-19 tidyselect_1.2.0
[106] bit_4.0.5 compiler_4.3.2 htmlTable_2.4.2 BiocNeighbors_1.20.2 DelayedArray_0.28.0
[111] plotly_4.10.4 checkmate_2.3.1 scales_1.3.0 lmtest_0.9-40 stringr_1.5.1
[116] digest_0.6.34 goftest_1.2-3 spatstat.utils_3.0-4 rmarkdown_2.25 XVector_0.42.0
[121] base64enc_0.1-3 htmltools_0.5.7 pkgconfig_2.0.3 sparseMatrixStats_1.14.0 fastmap_1.1.1
[126] rlang_1.1.3 htmlwidgets_1.6.4 shiny_1.8.0 DelayedMatrixStats_1.24.0 farver_2.1.1
[131] zoo_1.8-12 jsonlite_1.8.8 BiocParallel_1.36.0 BiocSingular_1.18.0 RCurl_1.98-1.14
[136] magrittr_2.0.3 Formula_1.2-5 GenomeInfoDbData_1.2.11 patchwork_1.2.0 munsell_0.5.0
[141] Rcpp_1.0.12 reticulate_1.34.0 furrr_0.3.1 stringi_1.8.3 ggraph_2.1.0
[146] zlibbioc_1.48.0 MASS_7.3-60.0.1 plyr_1.8.9 parallel_4.3.2 listenv_0.9.0
[151] ggrepel_0.9.5 deldir_2.0-2 Biostrings_2.70.1 graphlayouts_1.1.0 splines_4.3.2
[156] tensor_1.5 locfit_1.5-9.8 fastcluster_1.2.6 igraph_1.6.0 spatstat.geom_3.2-8
[161] ggsignif_0.6.4 parsnip_1.1.1 ScaledMatrix_1.10.0 evaluate_0.23 tester_0.1.7
[166] foreach_1.5.2 tweenr_2.0.2 httpuv_1.6.13 RANN_2.6.1 tidyr_1.3.1
[171] purrr_1.0.2 polyclip_1.10-6 future_1.33.1 scattermore_1.2 ggforce_0.4.1
[176] rsvd_1.0.5 broom_1.0.5 xtable_1.8-4 rstatix_0.7.2 later_1.3.2
[181] class_7.3-22 tibble_3.2.1 memoise_2.0.1 AnnotationDbi_1.64.1 beeswarm_0.4.0
[186] cluster_2.1.6 timechange_0.3.0 globals_0.16.2

amissarova commented 7 months ago

thanks for letting me know, then i take it the problem really was in compatability of new Seurat and scWGCNA.