Open chen-peng-1874 opened 2 months ago
Do you install the gene-trajectory module?
You can try below code in R. reticulate::py_install("gene-trajectory")
Do you install the gene-trajectory module?
You can try below code in R. reticulate::py_install("gene-trajectory")
Yes, I installed it. But the error still present. I am wondering if there's something wrong with the python.dll. Although I do have the python310.dll.
Hi, I'm not sure what the issue is, but can you try to run reticulate::py_list_packages()
and check the output? You should have a line like
14 gene-trajectory 1.0.0 gene-trajectory=1.0.0 pypi
If gene trajectory is not there, can you try to install it as reticulate::py_install("gene-trajectory", pip = TRUE)
? The pip=TRUE
option may be needed since we do not have a conda package for gene-trajectory.
data_S <- GeneTrajectory::RunDM(data_S) cell.graph.dist <- GetGraphDistance(data_S, K = 10) cg_output <- CoarseGrain(data_S, cell.graph.dist, genes, N = 1000) Hello,Because the data is too big to run in R can't the gene distance above be run in python?
yes, it's possible to export the data to a folder and run using Python as described in https://github.com/KlugerLab/GeneTrajectory/issues/3#issuecomment-2070566770
It may be also interesting to reduce the data size as explained in https://klugerlab.github.io/GeneTrajectory/articles/fast_computation.html
data_S <- GeneTrajectory::RunDM(data_S) Thank you for your reply. But the problem occurs in this step, the error shows that the data is greater than 1000GiB, is there a good solution?
I see, it's possible to do the whole analysis in Python (see e.g. https://github.com/KlugerLab/GeneTrajectory-python and https://genetrajectory-python.readthedocs.io/latest/notebooks/tutorial_mouse_dermal.html for a tutorial).
However, I am afraid you will encounter similar issues. Computing the diffusion map in RunDM
creates a cell-cell distance matrix, which is quadratic in the number of cells and require a lot of memory and time to run.
How many cells do you have?
I see,I will try python first. However,we have about 340,000 cells.Do you have any more suggestions?
I would try randomly subsampling cells to a smaller number (~10k should be manageable, but you can probably do more) or partition the data if you have some meaningful metadata. You can then run runDM
and then follow the pipeline (which will use CoarseGrain to 1000-2000 or a procedure like https://klugerlab.github.io/GeneTrajectory/articles/fast_computation.html to further coarse-grain).
Python and R should have similar performances, so use the one you that makes the most sense.
It should be possible to subsample in a better way than random for large datasets, but we haven't investigated that yet. The method we use to coarse-grain cells CoarseGrain is based on having a cell-cell distance matrix. One could probably try a similar knn-based approach on a simpler gene embedding that could handle data of your size, but we haven't tested it and it's hard to predict if it would behave correctly.
Thank you I'll try your advice.
Can I use this code( dm_res = palantir.utils.run_diffusion_maps(ad, n_components=5) )instead of (run_dm(adata) )to calculate the intercellular distance?
I don't have experience with that package but the implementation looks similar. I think you can try it as alternative, just make sure to refer to the layer where the result is put (our package uses "X_dm", change it accordingly).
I tried to set up a virtualenv using [reticulate], however, I can not find the module. Here is the output:
> cal_ot_mat_from_numpy <- reticulate::import('gene_trajectory.compute_gene_distance_cmd')$cal_ot_mat_from_numpy
Error: C:/Users/Public/miniconda3/python310.dll - The specified module could not be found.Should I use Python instead R?