Nanostring-Biostats / InSituType

An R package for performing cell typing in SMI and other single cell data
Other
29 stars 11 forks source link

error when n_clusts = 0 or 1 #56

Closed patrickjdanaher closed 3 years ago

patrickjdanaher commented 3 years ago

This bug might already be fixed; Mark Gregory ran into this using the version of nbclust in ptolemy:

image

image

Mark has code that fixes the ptolemy version. He says, "The algorithm made the assumption than there would be unsupervised clusters at multiple places and in multiple functions. I had to find each of those and modify to all the possibility of no unsupervised clusters."

zhiiiyang commented 3 years ago

This bug has been fixed in the past.

library(MLEcell)
library(Ptolemy)
# main, 1000plex
data(mini_smi)
data(CPA16_RNAseq)
raw <- Matrix::t(mini_smi@expression$rna$raw)
neg <- Matrix::colMeans(mini_smi@expression$neg$raw)

semi <- MLEcell::cellEMClust(counts = raw,
                             neg = neg,
                             bg = NULL,
                             init_clust = NULL, n_clusts = 0,
                             fixed_profiles = CPA16_RNAseq,
                             nb_size = 10,
                             n_iters = 10,  # this is not enough
                             method = "EM",
                             shrinkage = 0.5,
                             subset_size = 200,
                             n_starts = 4,
                             n_benchmark_cells = 50,
                             n_final_iters = 10)
The following genes in the count data are missing from fixed_profiles and will be omitted: IGHG1,IGHA1
random start 1
iter 1
[1] -4.144763
[1] 1.843174
iter 2
[1] -4.040044
[1] 1.977227
iter 3
[1] -4.080381
[1] 2.037029
iter 4
[1] -4.372664
[1] 2.207741
Converged: <= 0.01% of cell type assignments changed in the last iteration.
==========================================================================
random start 2
iter 1
[1] -5.425383
[1] 1.954161
iter 2
[1] -5.266462
[1] 2.099755
iter 3
[1] -4.639814
[1] 2.022563
iter 4
[1] -4.891979
[1] 2.057518
iter 5
[1] -5.491101
[1] 2.259452
Converged: <= 0.01% of cell type assignments changed in the last iteration.
==========================================================================
random start 3
iter 1
[1] -4.863506
[1] 1.791374
iter 2
[1] -4.83185
[1] 1.854069
iter 3
[1] -4.943676
[1] 1.944675
iter 4
[1] -4.921363
[1] 1.975428
iter 5
[1] -5.294284
[1] 2.108963
Converged: <= 0.01% of cell type assignments changed in the last iteration.
==========================================================================
random start 4
iter 1
[1] -4.970812
[1] 1.88726
iter 2
[1] -4.37023
[1] 1.947289
iter 3
[1] -4.466666
[1] 2.067341
iter 4
[1] -4.352529
[1] 1.996358
iter 5
[1] -4.267613
[1] 1.975928
Converged: <= 0.01% of cell type assignments changed in the last iteration.
==========================================================================
clustering all cells
iter 1
[1] -4.787144
[1] 2.19474
iter 2
[1] -4.317163
[1] 2.066783
iter 3
[1] -4.529368
[1] 2.184971
iter 4
[1] -4.497862
[1] 2.176915
iter 5
[1] -4.428804
[1] 2.150542
Converged: <= 0.01% of cell type assignments changed in the last iteration.
==========================================================================
zhiiiyang commented 3 years ago

Here is what it looks like for n = 1

> semi <- MLEcell::cellEMClust(counts = raw,
+                              neg = neg,
+                              bg = NULL,
+                              init_clust = NULL, n_clusts = 1,
+                              fixed_profiles = CPA16_RNAseq,
+                              nb_size = 10,
+                              n_iters = 10,  # this is not enough
+                              method = "EM",
+                              shrinkage = 0.5,
+                              subset_size = 200,
+                              n_starts = 4,
+                              n_benchmark_cells = 50,
+                              n_final_iters = 10)
The following genes in the count data are missing from fixed_profiles and will be omitted: IGHG1,IGHA1
random start 1
iter 1
[1] -5.056099
[1] 1.634106
iter 2
[1] -5.352063
[1] 1.914855
iter 3
[1] -5.752971
[1] 2.152586
iter 4
[1] -5.667739
[1] 2.136701
iter 5
[1] -5.918023
[1] 2.143393
iter 6
[1] -4.833617
[1] 1.92608
Converged: <= 0.01% of cell type assignments changed in the last iteration.
==========================================================================
random start 2
iter 1
[1] -4.518445
[1] 1.561178
iter 2
[1] -4.668354
[1] 1.728138
iter 3
[1] -4.519271
[1] 1.878329
iter 4
[1] -4.237743
[1] 1.829438
iter 5
[1] -5.333327
[1] 1.994932
iter 6
[1] -5.03817
[1] 1.945925
Converged: <= 0.01% of cell type assignments changed in the last iteration.
==========================================================================
random start 3
iter 1
[1] -4.640778
[1] 1.637359
iter 2
[1] -4.044361
[1] 1.821131
iter 3
[1] -4.088856
[1] 1.856151
iter 4
[1] -4.632565
[1] 1.931866
iter 5
[1] -4.116106
[1] 1.87245
iter 6
[1] -4.807827
[1] 2.042497
Converged: <= 0.01% of cell type assignments changed in the last iteration.
==========================================================================
random start 4
iter 1
[1] -5.598454
[1] 1.886224
iter 2
[1] -6.287903
[1] 2.301688
iter 3
[1] -5.977605
[1] 2.209506
iter 4
[1] -5.816146
[1] 2.133653
Converged: <= 0.01% of cell type assignments changed in the last iteration.
==========================================================================
clustering all cells
iter 1
[1] -4.876683
[1] 2.026882
iter 2
[1] -4.780239
[1] 2.17361
iter 3
[1] -4.614997
[1] 2.246028
iter 4
[1] -4.501536
[1] 2.18502
iter 5
[1] -4.493178
[1] 2.195785
iter 6
[1] -4.45799
[1] 2.162251
Converged: <= 0.01% of cell type assignments changed in the last iteration.
==========================================================================
patrickjdanaher commented 3 years ago

Thanks. Mark was working from an older version, so I think we're good now. Closing this issue.