Doodleverse / segmentation_gym

A neural gym for training deep learning models to carry out geoscientific image segmentation. Works best with labels generated using https://github.com/Doodleverse/dash_doodler
MIT License
45 stars 11 forks source link

integrate transformers library in donda env for `segformer` model option #119

Closed dbuscombe-usgs closed 1 year ago

dbuscombe-usgs commented 1 year ago

the segformer model is now fully integrated by there remain some issues with the conda environment

In https://github.com/Doodleverse/segmentation_gym/issues/115 @CameronBodine noted

... it threw an error (see below). I again had issues with not finding libcuda library, similar to what I noted on https://github.com/Doodleverse/segmentation_gym/issues/78 , so I went through the process of re-installing cuda and nvida on my device (see https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html).

I didn't get the error on my Ubuntu box but did on my Windows box. I'm looking for a conda env workaround

CameronBodine commented 1 year ago

I believe this is my recipe for install on Windows:

  1. Made sure NVIDIA drivers were up to date.
  2. Install and set libmamba as the default environment solver in base environment following this.
  3. Install everything else except tensorflow as recommended in #78, except I did python=3.9 instead of 3.8. It may work with 3.10 but I have not tested:
    conda create -n gym python=3.9
    conda activate gym
    conda install -c conda-forge scipy "numpy>=1.16.5, <=1.23.0" scikit-image cython ipython joblib tqdm pandas pip plotly natsort pydensecrf matplotlib 
    pip install doodleverse_utils transformers
  4. Then I followed tensorflow install instructions for installing on Windows Native OS:
    conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
    # Anything above 2.10 is not supported on the GPU on Windows Native
    python -m pip install "tensorflow<2.11"
    # Verify install:
    python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Hope this works for you!

dbuscombe-usgs commented 1 year ago

Thanks, I will give this a try!

This reminds me I need to nuke the 'pydensecrf' requirement from the docs

dbuscombe-usgs commented 1 year ago

I successfully installed the conda env, but it doesn't work. I get the same error

2023-02-24 11:51:04.512581: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
  /usr/local/cuda
dbuscombe-usgs commented 1 year ago

I didnt first make sure NVIDIA drivers were up to date. I don't know how to do this, and dont remember ever having to do this before

CameronBodine commented 1 year ago

Try finding the driver here: https://www.nvidia.com/Download/index.aspx?lang=en-us

CameronBodine commented 1 year ago

FYI:

(gym) PS E:\Python\segmentation_gym> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.1, V10.1.243
dbuscombe-usgs commented 1 year ago

Ugh, windows

dbuscombe-usgs commented 1 year ago

conda install -n base conda-libmamba-solver fails too

CameronBodine commented 1 year ago

Sucky. Maybe fresh miniconda install?

dbuscombe-usgs commented 1 year ago

Hmmm. I shouldnt need to update my drivers, or reinstall conda. That would be too disruptive for me. I'm going to see if I can figure out a conda solution

CameronBodine commented 1 year ago

FYI:

(gym) PS E:\Python\segmentation_gym> conda info

     active environment : gym
    active env location : C:\Users\csb67\AppData\Local\miniconda3\envs\gym
            shell level : 2
       user config file : C:\Users\csb67\.condarc
 populated config files : C:\Users\csb67\.condarc
          conda version : 23.1.0
    conda-build version : not installed
         python version : 3.10.9.final.0
       virtual packages : __archspec=1=x86_64
                          __cuda=12.0=0
                          __win=0=0
       base environment : C:\Users\csb67\AppData\Local\miniconda3  (writable)
      conda av data dir : C:\Users\csb67\AppData\Local\miniconda3\etc\conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
                          https://conda.anaconda.org/conda-forge/win-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : C:\Users\csb67\AppData\Local\miniconda3\pkgs
                          C:\Users\csb67\.conda\pkgs
                          C:\Users\csb67\AppData\Local\conda\conda\pkgs
       envs directories : C:\Users\csb67\AppData\Local\miniconda3\envs
                          C:\Users\csb67\.conda\envs
                          C:\Users\csb67\AppData\Local\conda\conda\envs
               platform : win-64
             user-agent : conda/23.1.0 requests/2.28.1 CPython/3.10.9 Windows/10 Windows/10.0.19044 solver/libmamba conda-libmamba-solver/22.8.1 libmambapy/1.3.1
          administrator : False
             netrc file : None
           offline mode : False

I do believe the change to python 3.10 was a significant one. I have had to reinstall miniconda on both of my Windows computers recently. For what it's worth!

dbuscombe-usgs commented 1 year ago

I think I've now exhausted all options except updating nvidia or conda, which I'm not currently prepared to do. I suppose I will not make segformer models on windows

dbuscombe-usgs commented 1 year ago

I was able to install miniconda and use Cam's mamba recipe to install a gym environment. It works with the Unets, but not the segformer model. I'll keep troubleshooting

CameronBodine commented 1 year ago

I did notice in my PINGMapper.yml that I list installing transformers after tensorflow. Perhaps an order of operations thing??

name: ping
channels:
  - conda-forge
  - defaults
dependencies:
  - python
  - pandas
  - rasterio
  - pyproj
  - scikit-image
  - joblib
  - gdal
  - matplotlib
  - pip
  - pip:
      - psutil
      - tensorflow
      - transformers
dbuscombe-usgs commented 1 year ago

With miniconda and mamba, I can now get a working environment for training any unet models, in python 3.8, 3.9, and 3.10. I can do this using either the pip or conda way of installing TF, or the conda-forge way.

The only issue is using SegFormer models. It errors out with the same message every time. It doesn't matter if I install transformers using pip or conda, before or after TF

I have not been able to update my nvidia drivers. I simply can't find a link that will allow me to install something to " C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2", which is where transformers expects the cuda drivers to be (what am I missing?)

dbuscombe-usgs commented 1 year ago

I thought one major advantage with installing TF the "conda-forge" route was not having to update nvidia drivers on windows.

If I attempt to install the 11.2 cuda toolkit, from here: https://developer.nvidia.com/cuda-11.2.0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exelocal

I get warning messages saying that I'm about to downgrade NVIDIA versions, which doesn't seem right

On neither windows computer I have, both of which have gym environments working well for Unets, do I have access to the program nvcc. I'm very reluctant to go this route right now, for fear that I will break my working conda envs

dbuscombe-usgs commented 1 year ago

Eureka!! Add this to the conda env to make it use segformers

conda install cuda -c nvidia

I will update the README

dbuscombe-usgs commented 1 year ago

Ok, I posted new conda recipes for Gym that allows for use of segformers on windows and ubuntu

Thanks @CameronBodine for helping troubleshoot and test!

https://github.com/Doodleverse/segmentation_gym#%EF%B8%8F-installation