Doodleverse / segmentation_gym

A neural gym for training deep learning models to carry out geoscientific image segmentation. Works best with labels generated using https://github.com/Doodleverse/dash_doodler
MIT License
45 stars 11 forks source link

Installing Gym dependencies with conda recipe instead of yml #78

Closed venuswku closed 2 years ago

venuswku commented 2 years ago

Is your feature request related to a problem? Please describe.

When I was installing the dependencies for Gym, I kept getting stuck at the "Solving environment" step when I ran the command conda env create --file install/gym.yml. I let it run for an hour and then aborted the install. I also tried running the command conda install -c conda-forge mamba to install dependencies quicker (according to the README) with mamba, but I encountered the same problem.

Describe the solution you'd like.

I'd like to use a working conda recipe to install Gym's dependencies. The following commands are what works for me:

conda create -n gym python=3.10
conda activate gym
conda install -c conda-forge scipy numpy scikit-image cython ipython joblib tqdm pandas pip plotly natsort pydensecrf matplotlib -y
pip install doodleverse_utils tensorflow-gpu

My whole installation process took about 10 minutes, and it installed 154 total packages.

Describe alternatives you've considered.

I also tried the following commands to test if installing using mamba is actually faster than conda:

conda create -n gym2 python=3.10
conda activate gym2
conda install -c conda-forge mamba
mamba install -c conda-forge scipy numpy scikit-image cython ipython joblib tqdm pandas pip plotly natsort pydensecrf matplotlib -y
pip install doodleverse_utils tensorflow-gpu

This installation process actually took me a little bit more time (about 12 minutes), which makes sense because mamba installed 37 extra packages (191 total). Both installations went well for me! I personally would prefer to install everything with conda to avoid installing extra packages, but the mamba documentation says that it offers more "reliable environment solutions" so maybe its installed packages are more compatible with one another.

Additional context

Can someone test if my conda recipes work for them too? Currently, I'm still figuring out which dependencies were causing the yml installation to take so long. I'll report here if I find anything! Here's the installed dependencies and other info (version, build, channel) for my conda installation: conda.txt. And here's the installed dependencies for my mamba installation: mamba.txt.

venuswku commented 2 years ago

I think what caused the yml installation to take a long time was because it tried to install the tensorflow-gpu package using conda. When I tried to run conda install -c conda-forge tensorflow-gpu, conda install -c anaconda tensorflow-gpu, or conda install tensorflow-gpu, they were all stuck at the "Solving environment" step. Only running pip install tensorflow-gpu worked for me. Is this happening for others?

dbuscombe-usgs commented 2 years ago

I tested your recipe on Ubuntu 22.04 and it was very fast (< 5 minutes). I tested the new env using from doodleverse_utils.imports import * and all dependencies were found

tf.__version__ = 2.10.0

HOWEVER, like with CoastSeg, specifically https://github.com/SatelliteShorelines/CoastSeg/issues/68, the pip installed versions are not usable

physical_devices = tf.config.experimental.list_physical_devices('GPU')
 print(physical_devices)

outputs

2022-09-12 16:16:06.701123: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-12 16:16:06.701330: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701417: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701494: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701599: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701702: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701807: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701911: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701969: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701977: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

which means it did not find CUDA toolkit and therefore sees no GPU

so, I had to pip uninstall tensorflow-gpu and my only real other option seems to be conda install -c conda-forge tensorflow-gpu, which I realize was problematic for you....

... it was also problematic for me - it did not solve the environment

@venuswku please can you come up with a new recipe, this time for python=3.8? Like CoastSeg, I have a hunch that python 3.10 is just too new ..

venuswku commented 2 years ago

Thanks for trying it out!! I'll try making a conda recipe with python=3.8 and tensorflow-gpu. @dbuscombe-usgs, did you try to see if conda install -c anaconda tensorflow-gpu or conda install tensorflow-gpu works for you?

Turns out that my system only had problems installing tensorflow-gpu. Installing tensorflow by running conda install -c conda-forge tensorflow works fine for me.

dbuscombe-usgs commented 2 years ago

No, conda install of tensorflow would not resolve for me using python 3.10

venuswku commented 2 years ago

@dbuscombe-usgs, can you see if the following recipe works for you? The main difference from the previous recipes is that it uses python=3.8 and allows the user to install the TensorFlow package that corresponds with their system.

conda create -n gym python=3.8
conda activate gym
conda install -c conda-forge scipy "numpy>=1.16.5, <=1.23.0" scikit-image cython ipython joblib tqdm pandas pip plotly natsort pydensecrf matplotlib
pip install doodleverse_utils

Then run one of the following two commands:

My CPU system had problems installing tensorflow-gpu but successfully installed tensorflow, so I put the option of choosing to install tensorflow-gpu or tensorflow depending on the user's system.

dbuscombe-usgs commented 2 years ago

Added recipe in latest commit https://github.com/Doodleverse/segmentation_gym/commit/8ca5788e0e15bd675fbe641e562934c081767d5f

CameronBodine commented 1 year ago

@dbuscombe-usgs, can you see if the following recipe works for you? The main difference from the previous recipes is that it uses python=3.8 and allows the user to install the TensorFlow package that corresponds with their system.

conda create -n gym python=3.8
conda activate gym
conda install -c conda-forge scipy "numpy>=1.16.5, <=1.23.0" scikit-image cython ipython joblib tqdm pandas pip plotly natsort pydensecrf matplotlib
pip install doodleverse_utils

Then run one of the following two commands:

* `conda install -c conda-forge tensorflow-gpu` if you have a CUDA-enabled GPU

* `conda install -c conda-forge tensorflow` if you have a CPU

My CPU system had problems installing tensorflow-gpu but successfully installed tensorflow, so I put the option of choosing to install tensorflow-gpu or tensorflow depending on the user's system.

On Ubuntu 22.04, I additionally had to specify installation of tensorflow==2.10 with:

conda install -c conda-forge tensorflow-gpu=2.10

Without specifying the version, tensorflow v2.11 would install, resulting in issues locating libdevice.