Closed venuswku closed 2 years ago
I think what caused the yml
installation to take a long time was because it tried to install the tensorflow-gpu
package using conda
. When I tried to run conda install -c conda-forge tensorflow-gpu
, conda install -c anaconda tensorflow-gpu
, or conda install tensorflow-gpu
, they were all stuck at the "Solving environment" step. Only running pip install tensorflow-gpu
worked for me. Is this happening for others?
I tested your recipe on Ubuntu 22.04 and it was very fast (< 5 minutes). I tested the new env using from doodleverse_utils.imports import *
and all dependencies were found
tf.__version__
= 2.10.0
HOWEVER, like with CoastSeg, specifically https://github.com/SatelliteShorelines/CoastSeg/issues/68, the pip installed versions are not usable
physical_devices = tf.config.experimental.list_physical_devices('GPU')
print(physical_devices)
outputs
2022-09-12 16:16:06.701123: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-12 16:16:06.701330: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701417: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701494: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701599: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701702: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701807: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701911: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701969: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-09-12 16:16:06.701977: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
which means it did not find CUDA toolkit and therefore sees no GPU
so, I had to pip uninstall tensorflow-gpu
and my only real other option seems to be conda install -c conda-forge tensorflow-gpu
, which I realize was problematic for you....
... it was also problematic for me - it did not solve the environment
@venuswku please can you come up with a new recipe, this time for python=3.8
? Like CoastSeg, I have a hunch that python 3.10 is just too new ..
Thanks for trying it out!! I'll try making a conda recipe with python=3.8
and tensorflow-gpu
. @dbuscombe-usgs, did you try to see if conda install -c anaconda tensorflow-gpu
or conda install tensorflow-gpu
works for you?
Turns out that my system only had problems installing tensorflow-gpu
. Installing tensorflow
by running conda install -c conda-forge tensorflow
works fine for me.
No, conda install of tensorflow would not resolve for me using python 3.10
@dbuscombe-usgs, can you see if the following recipe works for you? The main difference from the previous recipes is that it uses python=3.8
and allows the user to install the TensorFlow package that corresponds with their system.
conda create -n gym python=3.8
conda activate gym
conda install -c conda-forge scipy "numpy>=1.16.5, <=1.23.0" scikit-image cython ipython joblib tqdm pandas pip plotly natsort pydensecrf matplotlib
pip install doodleverse_utils
Then run one of the following two commands:
conda install -c conda-forge tensorflow-gpu
if you have a CUDA-enabled GPUconda install -c conda-forge tensorflow
if you have a CPUMy CPU system had problems installing tensorflow-gpu
but successfully installed tensorflow
, so I put the option of choosing to install tensorflow-gpu
or tensorflow
depending on the user's system.
Added recipe in latest commit https://github.com/Doodleverse/segmentation_gym/commit/8ca5788e0e15bd675fbe641e562934c081767d5f
@dbuscombe-usgs, can you see if the following recipe works for you? The main difference from the previous recipes is that it uses
python=3.8
and allows the user to install the TensorFlow package that corresponds with their system.conda create -n gym python=3.8 conda activate gym conda install -c conda-forge scipy "numpy>=1.16.5, <=1.23.0" scikit-image cython ipython joblib tqdm pandas pip plotly natsort pydensecrf matplotlib pip install doodleverse_utils
Then run one of the following two commands:
* `conda install -c conda-forge tensorflow-gpu` if you have a CUDA-enabled GPU * `conda install -c conda-forge tensorflow` if you have a CPU
My CPU system had problems installing
tensorflow-gpu
but successfully installedtensorflow
, so I put the option of choosing to installtensorflow-gpu
ortensorflow
depending on the user's system.
On Ubuntu 22.04, I additionally had to specify installation of tensorflow==2.10 with:
conda install -c conda-forge tensorflow-gpu=2.10
Without specifying the version, tensorflow v2.11 would install, resulting in issues locating libdevice.
Is your feature request related to a problem? Please describe.
When I was installing the dependencies for Gym, I kept getting stuck at the "Solving environment" step when I ran the command
conda env create --file install/gym.yml
. I let it run for an hour and then aborted the install. I also tried running the commandconda install -c conda-forge mamba
to install dependencies quicker (according to the README) withmamba
, but I encountered the same problem.Describe the solution you'd like.
I'd like to use a working
conda
recipe to install Gym's dependencies. The following commands are what works for me:My whole installation process took about 10 minutes, and it installed 154 total packages.
Describe alternatives you've considered.
I also tried the following commands to test if installing using
mamba
is actually faster thanconda
:This installation process actually took me a little bit more time (about 12 minutes), which makes sense because
mamba
installed 37 extra packages (191 total). Both installations went well for me! I personally would prefer to install everything withconda
to avoid installing extra packages, but themamba
documentation says that it offers more "reliable environment solutions" so maybe its installed packages are more compatible with one another.Additional context
Can someone test if my
conda
recipes work for them too? Currently, I'm still figuring out which dependencies were causing theyml
installation to take so long. I'll report here if I find anything! Here's the installed dependencies and other info (version, build, channel) for myconda
installation: conda.txt. And here's the installed dependencies for mymamba
installation: mamba.txt.