Closed Li-ZhiD closed 1 day ago
I've resolved this issue on my linux machine. You can have a try. And maybe SEDR's team @Li-ZhiD @HzFu @Xuhang01 can help on fixing this rpy2 bug.
conda install --yes rpy2
In comparison with pip install, this can prevent the jupyter kernel from dying while creating an R environment in your conda environment. See reference here
import os
os.environ['R_HOME'] = '/mnt/data/tool/miniconda3/envs/SEDR/lib/R' # your conda env R path
os.environ['R_USER'] = '/mnt/data/tool/miniconda3/envs/SEDR/lib/python3.11/site-packages/rpy2' # your conda env path that has installed rpy2
os.environ['R_LIBS'] = '/mnt/data/tool/miniconda3/envs/SEDR/lib/R/library' # your conda env R library path
R library path can be modified using .libPaths(path_to_your_lib)
in R
import rpy2.robjects as ro
r_home = ro.r('R.home()')
r_home
def mclust_R(adata, n_clusters, use_rep='SEDR', key_added='SEDR', random_seed=2024):
"""
Clustering using the mclust algorithm.
The parameters are the same as those in the R package mclust.
"""
# import os
# os.environ['R_HOME'] = '/mnt/data/tool/miniconda3/envs/SEDR'
# os.environ['R_USER'] = '/mnt/data/tool/miniconda3/envs/SEDR/lib/python3.11/site-packages/rpy2/'
modelNames = 'EEE'
np.random.seed(random_seed)
import rpy2.robjects as robjects
robjects.r.library("mclust")
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
r_random_seed = robjects.r['set.seed']
r_random_seed(random_seed)
rmclust = robjects.r['Mclust']
res = rmclust(rpy2.robjects.numpy2ri.numpy2rpy(adata.obsm[use_rep]), n_clusters, modelNames)
mclust_res = np.array(res[-2])
adata.obs[key_added] = mclust_res
adata.obs[key_added] = adata.obs[key_added].astype('int')
adata.obs[key_added] = adata.obs[key_added].astype('category')
return adata
We should comment the first two lines because we won't use it. Hope SEDR team can fix this.
Now everything can go smoothly. Hope this help.
conda install rpy2
after creating a new conda environment because conda will prepare all things including R executable that otherwise can't be achieved using pip install rpy2
. os.environ['R_HOME']
, we should write ../envs/SEDR/lib/R
rather than ../envs/SEDR/bin/R
even if you use which R
to find your R executable in ../envs/SEDR/bin/R
. Otherwise, the kernel will die.pip
in terminal, just refer to this article.graph_dict = SEDR.graph_construction(adata, 12)
? It's different according to various technologies. Is it a monotonic parameter?sedr_net.train_with_dec(N=1)
?@rocketeer1998 Thanks a lot for your assistance. Hope it works for @Li-ZhiD. For your questions:
Thanks for your quick response! For my question4, SEDR runs much slower when I execute the same code to give another run in the same jupyter kernel. I don't know why. Do you have any ideas on how to run SEDR efficiently in a for loop?
Thanks a lot! I will try it later.
What is the relationship between the number of neighbors (6 or 12 or 18) and computational time?
@rocketeer1998 I have tested on Slide-seq and it shows that using different number of neighbors will not change computational time. You can also test it on different datasets.
@Xuhang01 After these days of testing, I'm still confused why my question 4 exists. To elaborate this issue, I've tested SEDR on my anndata with 5000 cells and 200 genes. In scenario 1, it took me 1 minute to run SEDR pipeline on this data. In scenario 2, it took me 72 minutes to run the same SEDR pipeline on this data though it was proceeded by 5 datasets' running of the same SEDR pipeline in a for loop. Seems like the computational efficiency of the remaining datasets in a for loop will be greatly effected. Do you know why?
@rocketeer1998 Hi, I have tried to do similar analyses as you described, but do not get the same problem. Could you share your script with me? I cannot guarantee that my code is the same as yours.
Hi @Li-ZhiD , @rocketeer1998 and @Xuhang01 (Thank you for sharing your code!)
After installing SEDR on my Linux machine, I listed what I had to do in addition of the instructions. I recently proposed a Pull Request to add those extra steps to the SEDR installation instructions.
I had not seen this issue before, but here are some things might be relevant to it!
mclust
R package installationHere is an alternative strategy to @rocketeer1998 's solution, adapting rpy2 documentation) - just have to do it once, in your environment:
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
# set R package names
packnames = ('mclust',)
# import R's utility package
utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1) # select the first mirror in the list
# list and install missing packages
packnames_to_install = [x for x in packnames if not rpackages.isinstalled(x)]
if len(packnames_to_install) > 0:
utils.install_packages(StrVector(packnames_to_install))
conda
and environment.yaml
file to manage dependencies and automatize environment creationUsing an environment.yaml
file that specifies once and for all the dependencies to install is a very handy approach that ensures reproducibilty when creating environments.
# environment.yaml
# Create a new environment: `conda env create -f environment.yaml`
# Update the existing environment: `conda env update -f environment.yaml`
name: SEDR
channels:
- ...
dependencies:
- ...
One solution by @rocketeer1998 mentions setting new environment variables in the form of import os; os.environ["NEW_ENV_VARIABLE"] = "value"
. While I didn't have to set new environment variables in my setup, I'd like to mention another approach that i learned, using conda
as seen in conda's documentation:
# In terminal
conda activate SEDR
conda env config vars set NEW_ENV_VARIABLE=value
environment.yaml
file (i prefer this one to keep track of every modifications i did):
# environment.yaml
conda env create -f environment.yaml
conda env update -f environment.yaml
name: SEDR channels:
Hope this helps! 🙂
@edoumazane Thank you very much!
jupyter kernel shutdown when I run mclust, I checked and found the R environment is:
but it doesn't work after changing to "/usr".