TheoreticalEcology / s-jSDM

Scalable joint species distribution modeling
https://cran.r-project.org/web/packages/sjSDM/index.html
GNU General Public License v3.0
67 stars 14 forks source link

Installation on hpc #125

Closed YJ781 closed 1 year ago

YJ781 commented 1 year ago

Hi sjSDM developers,

Thank you very much for your support in installing and using this interesting package. I have a large dataset, and running sjSDM_cv on my laptop has become slow. Unfortunately, I don't have access to any available NVIDIA GPUs. As a result, I am considering running the package on a supercomputer with a Linux operating system. However, I have encountered difficulties installing Pytorch.

To begin with, I created a conda environment and installed the latest version of R. I then tried installing sjSDM using the install_sjSDM() function, as well as the manual installation method described in the ?installation_help documentation. Both attempts were unsuccessful. Do you have any idea about this? Thanks!

MaximilianPi commented 1 year ago

Hi @YJ781 ,

can you please show the output of reticulate::py_config()?

I think one problem could be that you have mixed two different installation setups. It is better to either install everything manually with conda or use the R function install_sjSDM().

Conda manual installation:

  1. Remove existing conda environments

    • List all conda envvironments: $conda env list
    • Delete them (except for the base environment) by removing the folder of the environment via: $rm -r ~/miniconda3/envs/<name of environment> #adjust the path or by using the conda command: $conda remove -n ENV_NAME --all
    • Make sure that you have only one miniconda installation (the install_sjSDM() function installs also a miniconda version, delete it)
  2. Setup conda environment $conda create -n r-sjsdm python==3.9 If your supercomputer doesn't have a NVIDIA GPU: $conda install pytorch torchvision torchaudio cpuonly -c pytorch else: $conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

    $conda -m pip install pyro-ppl torch_optimizer madgrad tqdm

  3. Test installation Run R and load sjSDM, do not run install_sjSDM()! Try the minimal example.

YJ781 commented 1 year ago

Hi @MaximilianPi

Thank you so much for your detailed response. I got help from our university's supercomputer support team. They used the following steps: python -m venv r-sjsdm source ~/sjsdm/r-sjsdm/bin/activate
 pip install --upgrade pip
 pip install torch_optimizer pyro-ppl madgrad In R: install.packages(“sjSDM”,repo="https://cran.case.edu/")


This approach worked, so I just moved on to process my data. But the calculation remains slow, so I still need NVIDIA GPU. Nevertheless, thank you for your assistance!

MaximilianPi commented 1 year ago

Hi @YJ781,

good to hear! What are you calculating? You could speed up the calculations by using less Monte Carlo sampling in sjSDM and the Anova (samples/sampling = ....), which will reduce the accuracy of the results, but it might be worth it if you are still exploring the models/data.

YJ781 commented 1 year ago

Hi @MaximilianPi ,

Actually, my community isn't very large, with only 1000 ASVs. I experienced the low speed in the tuning function sjSDM_cv, while the function sjSDM is very fast. Right now I set the sjSDM_cv parameters as: sampling = 100, learning_rate = 0.01, iter = 100L, CV = 10 Which parameter do you suggest I adjust to enhance the processing speed without significantly compromising accuracy? Thanks!

MaximilianPi commented 1 year ago

Hi @YJ781, I meant the sampling argument - which you have already adjusted to smallest recommended number. Yes, I guess with 1000 ASVs you really need a GPU.

YJ781 commented 1 year ago

Hi @MaximilianPi ,

Yes. It looks that GPU is necessary for me. Thanks a ton!