cole-trapnell-lab / garnett

Automated cell type classification
MIT License
99 stars 24 forks source link

Changing cluster_extend_max_frac_unknown and cluster_extend=TRUE not results in less unknown #55

Closed keithgmitchell closed 3 years ago

keithgmitchell commented 3 years ago

Describe the bug I am not able to limit the amount of unknown calls.

To Reproduce

pbmc_cds2 <- classify_cells(agg.cds, pbmc_classifier,
                           db = org.Mm.eg.db,
                           **cluster_extend = TRUE,**
                           **cluster_extend_max_frac_unknown = 0.95,**
                           cluster_extend_max_frac_incorrect = 0.10,
                           cds_gene_id_type = "SYMBOL")
?classify_cells
#head(pData(pbmc_cds2))
table(pData(pbmc_cds2)$garnett_cluster)
table(pData(pbmc_cds2)$cell_type)
table(pData(pbmc_cds2)$cluster_ext_type)
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21 
382  74 226 394 367 250 312 388 254  60 258  30 102 227 327 198 139 373 290 232 577 

                   B-Cells                  Basophils                CD8 T Cells         Developing B-Cells 
                        45                         62                          6                         51 
               Endothelium                 Neutrophil                   NK Cells Platelets and Erythrocytes 
                         4                         87                         17                        569 
          Stromal Cells???                    Unknown 
                        64                       4555 

                   B-Cells                  Basophils                CD8 T Cells         Developing B-Cells 
                        45                         62                          6                         51 
               Endothelium                 Neutrophil                   NK Cells Platelets and Erythrocytes 
                         4                         87                         17                        569 
          Stromal Cells???                    Unknown 
                        64                       4555 

Expected behavior Shouldnt the table(pData(pbmc_cds2)$cluster_ext_type) not return unknown like what was shown in the tutorial? I have tried many parameters here.

table(pData(pbmc_cds)$cell_type)
# B cells CD4 T cells CD8 T cells     T cells     Unknown
#     207         129          61         164         239

table(pData(pbmc_cds)$cluster_ext_type)
# B cells CD4 T cells     T cells 
#     403         190         207 
hpliner commented 3 years ago

Hello, the cluster extension uses three criteria to decide to reclassify cells in a cluster from unknown.

  1. The fraction of the cluster that is unclassified (unknown) is less than cluster_extend_max_unknown (default 0.95),
  2. the fraction of the classified cells that are a single dominant type is greater than 1 - cluster_extend_max_incorrect (default 0.1), and
  3. at least 5 cells are classified.

If you do a 2 way table of cell_type and garnett_cluster, you should be able to see which of these isn't satisfied. To force the cluster extension to classify, you'll need to change the default values of the two parameters above.

hpliner commented 3 years ago

I'm going to close, if this remains an issue, please reopen