XuegongLab / HGC

fast hierarchical clustering for large-scale single-cell data
8 stars 5 forks source link

Incorrect number of dimensions when subsetting a Seurat object after applying FindClusteringTree(seurat, graph.type = "SNN") #5

Open victorjima opened 6 months ago

victorjima commented 6 months ago

Hello,

First of all, thank you for the package and the method to do a fast clustering with large single-cell datasets. I wanted to raise an issue I have recently encountered when using the FindClusteringTree in a seurat object.

I implemented you method because during the dimensionality reduction steps, the FindClusters Seurat's function froze every time I ran it in a dataset with 480k cells and 69 layers after integrations. It took way too much time and five days would pass without it finishing it. So I looked for clustering alternatives and found your package.

Then, the dimensionality reduction steps look like this to me:

 cll <- FindNeighbors(cll, reduction = integration, 
                         dims = 1:50,
                         verbose = T)
 cll <- FindClusteringTree(cll, graph.type = "SNN") # HGC library method
 cll <- RunUMAP(cll, reduction = integration, dims = 1:50, 
                   reduction.name = paste0("umap.", integration), reduction.key = "UMAP_", 
                   verbose = T) 

# Do clustering
cll@meta.data[, paste0(integration, "_clusters_snn08")] <- cutree(cll@graphs$ClusteringTree, k = round(69 * 0.2, 0))

Which did the clustering, although the cutree returns a cluster with +400k cells and the others with 1-100s of cells, and I don't know why.

The thing is, after this analysis I had to further process the object and, for that, I had to do a subset. I tried to do the subset in both possible ways:

torem.final <- c("c42" ,"c46", "c47", "c49", "c53")

cll <- subset(cll, subset = RNA.pca_clusters_snn08 %in% torem.final)

or

cll <- cll[,cll$RNA.pca_clusters_snn08 %in% torem.final]

But this error always rose:

Error in x[[g]][cells.g, cells.g, drop = FALSE] : 
  incorrect number of dimensions 

I tried many things: from changing the assay5 to assay, to changing characters for factors, removing reductions, checking cell names in the object... and still did not work (and the object looked fine).

The issue became more concercing upon loading another Seurat object from another project and checking that, in fact, the subset worked and did not output the aforementioned error.

The only thing that I did differently was this FindClusteringTree function. I tried doing the subset before and after this step, and it was only after applying it that the incorrect number of dimensions error appeared.

I do not know whether it is something I am doing wrongly here regarding applying your method or something related to incompatible versions. If you have any idea on why this happens, it would be a lot of help.

Thank you again!

Víctor

Here's my session info:

SessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                  LC_TIME=es_ES.UTF-8           LC_COLLATE=en_US.UTF-8        LC_MONETARY=es_ES.UTF-8      
 [6] LC_MESSAGES=en_US.UTF-8       LC_PAPER=es_ES.UTF-8          LC_NAME=es_ES.UTF-8           LC_ADDRESS=es_ES.UTF-8        LC_TELEPHONE=es_ES.UTF-8     
[11] LC_MEASUREMENT=es_ES.UTF-8    LC_IDENTIFICATION=es_ES.UTF-8

time zone: Europe/Madrid
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] HGC_1.10.0           patchwork_1.2.0      viridis_0.6.5        viridisLite_0.4.2    SeuratWrappers_0.3.3 harmony_1.2.0        Rcpp_1.0.12         
 [8] data.table_1.15.2    UCell_2.6.2          ggrepel_0.9.5        ggrastr_1.0.2        qs_0.25.7            lubridate_1.9.3      forcats_1.0.0       
[15] stringr_1.5.1        dplyr_1.1.4          purrr_1.0.2          readr_2.1.5          tidyr_1.3.1          tibble_3.2.1         tidyverse_2.0.0     
[22] gridExtra_2.3        ggpubr_0.6.0         ggplot2_3.5.0        Seurat_5.0.2         SeuratObject_5.0.1   sp_2.1-3             Signac_1.12.9004    

loaded via a namespace (and not attached):
  [1] fs_1.6.3                    matrixStats_1.2.0           spatstat.sparse_3.0-3       bitops_1.0-7                enrichplot_1.22.0          
  [6] HDO.db_0.99.1               httr_1.4.7                  RColorBrewer_1.1-3          tools_4.3.1                 sctransform_0.4.1          
 [11] backports_1.4.1             utf8_1.2.4                  R6_2.5.1                    lazyeval_0.2.2              uwot_0.1.16                
 [16] withr_3.0.0                 progressr_0.14.0            cli_3.6.2                   Biobase_2.62.0              spatstat.explore_3.2-6     
 [21] fastDummies_1.7.3           scatterpie_0.2.1            labeling_0.4.3              spatstat.data_3.0-4         ggridges_0.5.6             
 [26] pbapply_1.7-2               Rsamtools_2.18.0            yulab.utils_0.1.4           gson_0.1.0                  R.utils_2.12.3             
 [31] DOSE_3.28.2                 parallelly_1.37.1           rstudioapi_0.15.0           RSQLite_2.3.5               RApiSerialize_0.1.2        
 [36] generics_0.1.3              gridGraphics_0.5-1          ica_1.0-3                   spatstat.random_3.2-3       dendextend_1.17.1          
 [41] xlsx_0.6.5                  car_3.1-2                   GO.db_3.18.0                Matrix_1.6-5                ggbeeswarm_0.7.2           
 [46] fansi_1.0.6                 S4Vectors_0.40.2            abind_1.4-5                 R.methodsS3_1.8.2           lifecycle_1.0.4            
 [51] yaml_2.3.8                  carData_3.0-5               SummarizedExperiment_1.32.0 SparseArray_1.2.4           qvalue_2.34.0              
 [56] Rtsne_0.17                  grid_4.3.1                  blob_1.2.4                  promises_1.2.1              crayon_1.5.2               
 [61] miniUI_0.1.1.1              lattice_0.22-5              cowplot_1.1.3               xlsxjars_0.6.1              KEGGREST_1.42.0            
 [66] pillar_1.9.0                knitr_1.45                  fgsea_1.29.1                GenomicRanges_1.54.1        future.apply_1.11.1        
 [71] codetools_0.2-19            fastmatch_1.1-4             leiden_0.4.3.1              glue_1.7.0                  ggfun_0.1.4                
 [76] remotes_2.4.2.1             vctrs_0.6.5                 png_0.1-8                   treeio_1.26.0               spam_2.10-0                
 [81] gtable_0.3.4                cachem_1.0.8                xfun_0.42                   S4Arrays_1.2.1              mime_0.12                  
 [86] RcppEigen_0.3.4.0.0         tidygraph_1.3.0             survival_3.5-8              SingleCellExperiment_1.24.0 RcppRoll_0.3.0             
 [91] pheatmap_1.0.12             rJava_1.0-11                ellipsis_0.3.2              fitdistrplus_1.1-11         ROCR_1.0-11                
 [96] nlme_3.1-164                ggtree_3.10.1               bit64_4.0.5                 RcppAnnoy_0.0.22            GenomeInfoDb_1.38.7        
[101] irlba_2.3.5.1               vipor_0.4.7                 KernSmooth_2.23-22          colorspace_2.1-0            BiocGenerics_0.48.1        
[106] DBI_1.2.2                   tidyselect_1.2.0            bit_4.0.5                   compiler_4.3.1              BiocNeighbors_1.20.2       
[111] DelayedArray_0.28.0         plotly_4.10.4               stringfish_0.16.0           shadowtext_0.1.3            scales_1.3.0               
[116] lmtest_0.9-40               digest_0.6.34               goftest_1.2-3               spatstat.utils_3.0-4        rmarkdown_2.26             
[121] XVector_0.42.0              htmltools_0.5.7             pkgconfig_2.0.3             MatrixGenerics_1.14.0       fastmap_1.1.1              
[126] rlang_1.1.3                 htmlwidgets_1.6.4           shiny_1.8.0                 farver_2.1.1                zoo_1.8-12                 
[131] jsonlite_1.8.8              mclust_6.1                  BiocParallel_1.36.0         R.oo_1.26.0                 GOSemSim_2.28.1            
[136] RCurl_1.98-1.14             magrittr_2.0.3              GenomeInfoDbData_1.2.11     ggplotify_0.1.2             dotCall64_1.1-1            
[141] munsell_0.5.0               ape_5.7-1                   reticulate_1.35.0           stringi_1.8.3               ggraph_2.2.0               
[146] zlibbioc_1.48.0             MASS_7.3-60.0.1             plyr_1.8.9                  parallel_4.3.1              listenv_0.9.1              
[151] deldir_2.0-4                Biostrings_2.70.2           graphlayouts_1.1.0          splines_4.3.1               tensor_1.5                 
[156] hms_1.1.3                   igraph_1.6.0                spatstat.geom_3.2-9         ggsignif_0.6.4              RcppHNSW_0.6.0             
[161] reshape2_1.4.4              stats4_4.3.1                evaluate_0.23               BiocManager_1.30.22         RcppParallel_5.1.7         
[166] tzdb_0.4.0                  tweenr_2.0.3                httpuv_1.6.14               RANN_2.6.1                  polyclip_1.10-6            
[171] future_1.33.1               scattermore_1.2             ggforce_0.4.2               rsvd_1.0.5                  broom_1.0.5                
[176] xtable_1.8-4                RSpectra_0.16-1             tidytree_0.4.6              rstatix_0.7.2               later_1.3.2                
[181] clusterProfiler_4.10.0      aplot_0.2.2                 beeswarm_0.4.0              memoise_2.0.1               AnnotationDbi_1.64.1       
[186] IRanges_2.36.0              cluster_2.1.6               timechange_0.3.0            globals_0.16.2