Closed sudhirthakurela closed 6 years ago
me toooooooooooo. everything right before clustering. but then from differentialGeneTest , things are going to crazy. I even dint know where I can see the result since it was always running
Hi there,
Could you please provide at least some code which we can take a look? It seems like both of you are using DDRTree to get clusters which we don't recommend to do. But I am not sure what is going and cannot provide any suggestions without more details
Here is the set of code that I am using:
scDNMTKO_Monocle <- detectGenes(scDNMTKO_Monocle, min_expr = 20) fData(scDNMTKO_Monocle)$use_for_ordering <- fData(scDNMTKO_Monocle)$num_cells_expressed > 0.05 * ncol(scDNMTKO_Monocle)
plot_pc_variance_explained(scDNMTKO_Monocle, return_all = F)
scDNMTKO_Monocle <- reduceDimension(scDNMTKO_Monocle, max_components = 2, norm_method = 'log', reduction_method = 'tSNE', verbose = T)
scDNMTKO_Monocle <- clusterCells(scDNMTKO_Monocle, verbose = F) plot_cell_clusters(scDNMTKO_Monocle, color_by = 'as.factor(Cluster)') plot_cell_clusters(scDNMTKO_Monocle, color_by = 'as.factor(Day)')
plot_rho_delta(scDNMTKO_Monocle, rho_threshold = 2, delta_threshold = 4 )
scDNMTKO_Monocle <- clusterCells(scDNMTKO_Monocle,rho_threshold = 5,delta_threshold = 4,skip_rho_sigma = T,verbose = F) plot_cell_clusters(scDNMTKO_Monocle, color_by = 'as.factor(Cluster)')
plot_cell_clusters(scDNMTKO_Monocle, color_by = 'as.factor(Day)') clustering_DEG_genes <- differentialGeneTest(scDNMTKO_Monocle[expressed_genes,],fullModelFormulaStr = '~Cluster', cores = 10)
Thanks,
ST
Sorry Xiaojie, sorry for unclear question. Here is my core code in R, HSMM_gene_annotation is a matrix whose rows are gene names and columns ares cell names , HSMM_sample_sheet is data.frame whose row are cells with one column "Hours",HSMM_sample_sheet is data.frame whose row are genes with one column "gene short name", and if you like, maybe I can also send you my workplace, it it too bug to upload to giuhub:
pd <- new("AnnotatedDataFrame", data = HSMM_sample_sheet) fd <- new("AnnotatedDataFrame", data = HSMM_gene_annotation) HSMM <- newCellDataSet(HSMM_expr_matrix, phenoData = pd, featureData = fd, lowerDetectionLimit = 0.5, expressionFamily = gaussianff())
rpc_matrix <- relative2abs(HSMM, method = "num_genes") HSMM <- newCellDataSet(as(as.matrix(rpc_matrix), "sparseMatrix"), phenoData = pd, featureData = fd, lowerDetectionLimit = 0.5, expressionFamily = negbinomial.size()) HSMM <- estimateSizeFactors(HSMM) HSMM <- estimateDispersions(HSMM)
HSMM <- detectGenes(HSMM, min_expr = 0.1) expressed_genes <- row.names(subset(fData(HSMM), num_cells_expressed >= 10))
disp_table <- dispersionTable(HSMM) unsup_clustering_genes <- subset(disp_table, mean_expression >= 0.1) HSMM <- setOrderingFilter(HSMM, unsup_clustering_genes$gene_id) plot_ordering_genes(HSMM) plot_pc_variance_explained(HSMM, return_all = F)
HSMM <- reduceDimension(HSMM, max_components = 2, num_dim = 6, reduction_method = 'tSNE', verbose = T) HSMM <- clusterCells(HSMM, num_clusters = 4) plot_cell_clusters(HSMM)
HSMM_myo <- estimateDispersions(HSMM) diff_test_res <- differentialGeneTest(HSMM_myo[expressed_genes,], fullModelFormulaStr = "~Hours") ordering_genes <- row.names (subset(diff_test_res, qval < 0.01))
HSMM_myo <- setOrderingFilter(HSMM_myo, ordering_genes) plot_ordering_genes(HSMM_myo)
HSMM_myo <- reduceDimension(HSMM_myo, max_components = 2, method = 'DDRTree') HSMM_myo <- orderCells(HSMM_myo) plot_cell_trajectory(HSMM_myo, color_by = "CellType")
Hoi xiaojie, Maybe you can ignore my problem. without changing anything, it works now
Hi ST,
Thanks for your code. The scripts look pretty good to me. Here are a few comments I have:
scDNMTKO_Monocle <- detectGenes(scDNMTKO_Monocle, min_expr = 20)
min_expr
was set way too high here. min_expr means that the minimal expression which we regarded as expressed. Normally it is 1 for UMI counts or 0.1 for FPKM or other relative expression measure scDNMTKO_Monocle <- reduceDimension(scDNMTKO_Monocle, max_components = 2, norm_method = 'log', reduction_method = 'tSNE', verbose = T)
You can pass a num_dim argument. to this function where the top number of pca components will be used for tSNE dimension reduction. The value for num_dim can be identified when you look at the scree plot from your previous plot_pc_variance_explained
function
3.differentialGeneTest(scDNMTKO_Monocle[expressed_genes,],fullModelFormulaStr = '~Cluster', cores = 10)
It seems to me you used expressed_genes to subset your cds. Please first test on a small set of genes with the following code differentialGeneTest(scDNMTKO_Monocle[1:100,],fullModelFormulaStr = '~Cluster', cores = 10)
or you can just use a single core:
differentialGeneTest(scDNMTKO_Monocle[1:100,],fullModelFormulaStr = '~Cluster', cores = 1)
Let me know how long does those code take to run
Hi,
Just to get a better idea about min_exp:
I have data from SmartSeq2 and supplying raw read counts while creating seurat object.
nbt=CreateSeuratObject(scDNMTKORNA,project="scDNMT1KO",min.cells = 30,names.field = 1, names.delim = "",is.expr=20)
Using 20 as cutoff results in ~10000 genes passing the criteria.
Regarding your 3 point, I had meanwhile ran it with just 200 genes and it was done within 5 minutes. I could also run it using all expressed genes but that was not possible in an interactive R session but it worked if I submit it as job to the cluster (took about an hour with 30 cores)
Thanks for suggestions. Will let you know how step 2 goes.
Run into the same issue. However, I received the following warning messages
Warning in inherits(x, "factor"): closing unused connection 8 (<-xxxxx.com:11936)
Warning in inherits(x, "factor"): closing unused connection 7 (<-xxxxx.com:11936)
Warning in inherits(x, "factor"): closing unused connection 6 (<-xxxxx.com:11936)
Warning in inherits(x, "factor"): closing unused connection 5 (<-xxxxx.com:11936)
(Note: xxxxx.com is the host name of my machine)
Not quite familiar with the parallel computing in R. It looks like that there is a disconnection between the master and the workers.
In addition, I have one related question.
@Xiaojieqiu recommended not using DDRTree
to get clusters. Is DDRTree
still the preferred method for reduceDimension
?
@yicheinchang reduceDimension now performs two tasks. The first one is to run RGE for trajectory reconstruction. By default, it uses DDRTree. The second one is clustering. The default approach for clustering is to use tSNE for dimension reduction. Hope this helps
hi there,
I am using Mocole and have around 700 cells and ~ 13000 expressed genes and 13 clusters after filtering out and running DDRTree. While running differentialGeneTest it is taking too long. I am using 10 cores. Moreover, I am also not sure if it is doing anything or stuck somewhere without throwing an error. Is there a way to speed up the process? Or do you think ~13000 genes is too high?
ST