cole-trapnell-lab / monocle-release

280 stars 116 forks source link

kmeans centers not distinct #437

Open Junjie-Hu opened 3 years ago

Junjie-Hu commented 3 years ago

I receive the same error with #26 , however, it does not work when manually set ncenter = 100, 50 or 10, and actually the the number of the cells is 1099. When set auto_param_selection = F, it costs very long time. How to deal with it? my code is:

library(monocle)

select DC

Mono_tj <- subset(RNA,idents=c(5,9,10))

Extract data, phenotype data, and feature data from the SeuratObject

data <- as(as.matrix(Mono_tj@assays$RNA@counts), 'sparseMatrix') pd <- new('AnnotatedDataFrame', data = Mono_tj@meta.data) fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data)) fd <- new('AnnotatedDataFrame', data = fData)

Construct monocle cds

monocle_cds <- newCellDataSet(data, phenoData = pd, featureData = fd, lowerDetectionLimit = 0.5, expressionFamily = negbinomial.size()) monocle_cds <- estimateSizeFactors(monocle_cds) monocle_cds <- estimateDispersions(monocle_cds)

gene_id <- c("LAMP3","FSCN1","CCR7","CCL19","CCL22") monocle_cds <- setOrderingFilter(monocle_cds, gene_id) monocle_cds <- reduceDimension( monocle_cds, max_components = 2, method = 'DDRTree') Error in kmeans(t(Z), K, centers = centers): initial centers are not distinct. dim(monocle_cds) Features Samples 24292 1099

set ncenter = 100

monocle_cds <- reduceDimension( monocle_cds, max_components = 2, method = 'DDRTree', ncenter = 100) Error in kmeans(t(Z), K, centers = centers): initial centers are not distinct.

set ncenter = 50

monocle_cds <- reduceDimension( monocle_cds, max_components = 2, method = 'DDRTree', ncenter = 50) Error in kmeans(t(Z), K, centers = centers): initial centers are not distinct.

set ncenter = 10

monocle_cds <- reduceDimension( monocle_cds, max_components = 2, method = 'DDRTree', ncenter = 10) Error in kmeans(t(Z), K, centers = centers): initial centers are not distinct.

Junjie-Hu commented 3 years ago

When I check the code, I notice that there may be bugs in the reduceDimension function: I find that the bug is from: ddrtree_res <- do.call(DDRTree, ddr_args) Then I check the DDRTree function, and find the bug is from kmeans: K <- ncenter # 121 in my dataset centers = t(Z)[seq(1, ncol(Z), length.out = K), ] # ncol(Z) = 1099 kmean_res <- kmeans(t(Z), K, centers = centers) So, what is K used for ?

notguigao commented 3 years ago

When I check the code, I notice that there may be bugs in the reduceDimension function: I find that the bug is from: ddrtree_res <- do.call(DDRTree, ddr_args) Then I check the DDRTree function, and find the bug is from kmeans: K <- ncenter # 121 in my dataset centers = t(Z)[seq(1, ncol(Z), length.out = K), ] # ncol(Z) = 1099 kmean_res <- kmeans(t(Z), K, centers = centers) So, what is K used for ?

Same problem, I tried different options for parameter "method" and got same error message, so the bug may not stem from function DDRTree?

notguigao commented 3 years ago

Seems that I've solved this problem... This "not distinct" error may be caused by my previous step. I tried to compare genes choosed by Seurat and Monocle,

var.genes <- VariableFeatures(seurat_object)
mycds <- setOrderingFilter(mycds, var.genes)
p1 <- plot_ordering_genes(mycds)

disp_table <- dispersionTable(mycds)
disp.genes <- subset(disp_table, mean_expression >= 0.1 & dispersion_empirical >= 1 * dispersion_fit)$gene_id
mycds <- setOrderingFilter(mycds, disp.genes)
p2 <- plot_ordering_genes(mycds)

p1|p2

After I rebuild the mycds and run setOrderingFilter() only once, the reduceDimension() works fine :-D

mujiangxielu commented 2 years ago

I had the exact same problem, I didn't get this error when running project in rstudio server, but I got it when running in the same environment in the shell. Run setOrderingFilter() only once, set ncenter = 100, 50 or 10, set auto_param_selection = F, none of them works. Did you solve it?

Gliese-Tan commented 7 months ago

I had the exact same problem, I didn't get this error when running project in rstudio server, but I got it when running in the same environment in the shell. Run setOrderingFilter() only once, set ncenter = 100, 50 or 10, set auto_param_selection = F, none of them works. Did you solve it?

You can check if your gene set used for 'setOrderingFilter' is too small

hezuoxi commented 6 months ago

i got the same question, can anybody solve it?

hezuoxi commented 6 months ago

@Gliese-Tan hello have you solve this problem?