Open Fufu-Hu opened 7 months ago
Hi @Fufu-Hu,
I am not sure about what it could be.
1) Can you check if anything is still running (e.g. using top
or the Task Manager)?
2) Can you let me know the size of the objects (e.g. dim(cg_output[["graph.dist"]])
, dim(cg_output[["gene.expression"]])
)? I don't think it should be that slow is the size is 481x50, but it may be if you are using the full matrix.
3) Do you get any error or notifications when you start the cal_ot_mat_from_numpy function?
encounter similar problems.
it has been >4000 CPU hours, but without progress bar, for neither python or R. machine info: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz
the program seems working on another machine for the same data, the progress bar appeared in ~30 CPU hours. machine info: AMD Opteron(tm) Processor 6344
Hi @panyuwen,
It's hard to know what is going wrong in one machine when it works on another.
using subset of my original data (17k cells x 10k genes), with default parameters, it takes about 2500 CPU hours from the beginning to the end of the gene.dist.mat step. the progress bar appeared during the final 6 mins (so only 6min recorded on the bar).
machine info: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz ; centos7
@panyuwen
Do you also select the top genes and coarse grain cells? The reference steps in the tutorial are
genes = select_top_genes(adata, layer='counts')
gene_expression_updated, graph_dist_updated = coarse_grain_adata(adata, graph_dist=cell_graph_dist, features=genes, dims=10)
If so, what are the dimensions of gene_expression_updated
and graph_dist_updated
?
yes, I manually selected genes.
gene_expression_updated: (1000, 11352) graph_dist_updated: (1000, 1000)
11352
genes is a large number and calculating the earth mover distance is going to be very slow.
Try using ~2000 genes using select_top_genes
or a similar approach
Hi!
I installed module gene_trajectory with pip in a conda env.I can comput the gene-gene distances with the seurat data in GeneTrajectory tutorial and the progress _bar are showed in screen. But when I comput my own seurat data(36077 features across 482 samples), there's nothing in screen. The number of gene used to compute gene-gene distances is 481 and meta-cells is 50. I run "gene.dist.mat <- cal_ot_mat_from_numpy(ot_cost = cg_output[["graph.dist"]], gene_expr = cg_output[["gene.expression"]], num_iter_max = 50000, show_progress_bar = TRUE)" in R for at least 8 hours with no output even a progress_bar. Is there something I missed?
Hope receive a reply~