cassianobecker / dnn

MIT License
4 stars 0 forks source link

update environment to support CUDA on CUBIC #19

Open cassianobecker opened 4 years ago

cassianobecker commented 4 years ago

Some packages needed to be installed or downgraded to support the currently installed CUDA version on CUBIC.

Our current environment.yml file needs to be updated to reflect those requirements.

Here is the list of what I did manually to have it working (might not be exhaustive):

conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch conda install torchvision==0.5.0 -c pytorch conda install pillow==6.2.1 -c pytorchpillow==6.2.1 conda install pandas conda install libgcc pip install boto3

And attached is the complete list of the current working environment: conda_list.txt

harangju commented 4 years ago

Conversation on slack - cassiano

Harang, once you update your environment, could you try to run the current version of the code? You can do this by first pre-processsing some subjects by running dnn/experiments/hcp/process_dti.py (remember to include your AWS credentials saved in file ~/.aws/credentials (as described in the sprint2.txt document). Then, you can train a network by running dnn/experiments/hcp/hcp_dti_cnn.py. Make sure the subject lists for train and test (under experiments/conf) conform to a subset of the successfully pre-processed subjects that will be available as saved tensors in the subject folders inside /HCP_1200_tensor (which is specified in dataset/hcp/conf/hcp_database.ini), as a result of running process_dti.py. For more meaningful train results, you may want to increase the number of subjects pre-processed. A full list is available in dataset/hcp/res/all_subjects.txt. Please let me know how else I can help. Thanks!

harangju commented 4 years ago

what's the difference between local_processing_directory and local_server_directory in hcp_database.ini? @cassianobecker

cassianobecker commented 4 years ago

Looks like the package updates, especially the one related to downgrading pytorch versions to run on CBICA:

conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch conda install torchvision==0.5.0 -c pytorch conda install pillow==6.2.1 -c pytorchpillow==6.2.1

did not make it to this pull request (please refer to file attached with the output of a conda list command of a current working python environment at CBICA).

We could probably have two different yml files, or some form of modularity and switching within a single file (I believe that can be done, but not sure how).

harangju commented 4 years ago

Do you have to downgrade pytorch for CBICA? I ran into problems trying to specify the versions.

With regards to having two yml files, we can have something like local.yml for our local machines and cbica.yml for the cluster.

cassianobecker commented 4 years ago

Good point. The reason for downgrading was to be compatible with the CUDA versions in the cluster, the most recent version of which being 9.2. I dug a little further and found that the current stable PyTorch version 1.4 can support it, if this option is explicitly stated. I tried (now upgrading it) via:

conda install pytorch torchvision cudatoolkit=9.2 -c pytorch

and it worked, so we can use the more recent version of PyTorch, which is better. The only thing to worry is how to specify this cudatoolkit switch. In the worst case (because right now it is just one aspect that appears to differ between the local and CBICA environments), we can try leaving the yml as is, and then add an instruction to run the above conda install command after each person installs the dnn environment in CBICA.

harangju commented 4 years ago

Great!

Hmm an extra instruction would work. We could also just make a copy of the yml file and add =9.2 to cudatoolkit.

cassianobecker commented 4 years ago

As we discussed, the latest pytorch/cuda versions do not seem to work with the P100 gpu's at the current CBICA configuration. However, after many trials and variations, I could make them run with pytorch 1.1.0 and cuda9.0. So, I would suggest we include a second yml configuration specific to cbica (maybe call it environment_cbica.yml?) in our project. Attached is the result of the conda list in the current environment that is working successfully in my cbica account.

conda_list_cbica_dnn2.txt

harangju commented 4 years ago

oops. I accidentally pushed my changes to test-env-cuda and master instead of just to test-env-cuda.

cassianobecker commented 4 years ago

Here's the update listing for current functional environemnt dnn2:

`(dnn2) ~ » conda list

packages in environment at /cbica/home/beckerc/.conda/envs/dnn2:

#

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_llvm conda-forge alabaster 0.7.12 py_0 conda-forge asn1crypto 1.3.0 py36_0 attrs 19.3.0 py_0 conda-forge babel 2.8.0 py_0 conda-forge backcall 0.1.0 py_0 conda-forge blas 1.0 openblas bleach 3.1.1 py_0 conda-forge boto3 1.12.25 pypi_0 pypi botocore 1.15.25 pypi_0 pypi ca-certificates 2020.1.1 0 certifi 2020.4.5.1 py36_0 cffi 1.14.0 py36h2e261b9_0 chardet 3.0.4 py36_1003 cryptography 2.8 py36h1ba5d50_0 cudatoolkit 9.0 h13b8566_0 cycler 0.10.0 py_2 conda-forge dbus 1.13.6 he372182_0 conda-forge decorator 4.4.2 py_0 conda-forge defusedxml 0.6.0 py_0 conda-forge dipy 1.1.1 pypi_0 pypi docutils 0.15.2 pypi_0 pypi entrypoints 0.3 py36_0 expat 2.2.9 he1b5a44_2 conda-forge fontconfig 2.13.0 h9420a91_0 freetype 2.10.0 he983fc9_1 conda-forge gettext 0.19.8.1 hc5be6a0_1002 conda-forge giflib 5.2.1 h516909a_2 conda-forge glib 2.63.1 h5a9c865_0 gst-plugins-base 1.14.5 h0935bb2_2 conda-forge gstreamer 1.14.5 h36ae1b5_2 conda-forge h5py 2.10.0 py36h7918eee_0 hdf5 1.10.4 nompi_h3c11f04_1106 conda-forge icu 58.2 h9c2bf20_1 idna 2.9 py_1 conda-forge imagesize 1.2.0 py_0 conda-forge importlib_metadata 1.5.0 py36_0 ipykernel 5.1.4 py36h39e3cac_0 ipython 7.13.0 py36h5ca1d4c_0 ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.5.1 py_0 conda-forge jedi 0.16.0 py36_0 jinja2 2.11.1 py_0 conda-forge jmespath 0.9.5 pypi_0 pypi joblib 0.14.1 pypi_0 pypi jpeg 9c h14c3975_1001 conda-forge json5 0.9.0 py_0 conda-forge jsonschema 3.2.0 py36_0 jupyter_client 6.0.0 py_0 conda-forge jupyter_contrib_core 0.3.3 py_2 conda-forge jupyter_core 4.6.1 py36_0 jupyterlab 2.0.1 py_0 conda-forge jupyterlab_server 1.0.7 py_0 conda-forge kiwisolver 1.1.0 py36he6710b0_0 ld_impl_linux-64 2.33.1 h53a641e_8 conda-forge libblas 3.8.0 16_openblas conda-forge libcblas 3.8.0 16_openblas conda-forge libclang 9.0.1 default_hde54327_0 conda-forge libedit 3.1.20181209 hc058e9b_0 libffi 3.2.1 he1b5a44_1006 conda-forge libgcc 7.2.0 h69d50b8_2 libgcc-ng 9.2.0 h24d8f2e_2 conda-forge libgfortran-ng 7.3.0 hdf63c60_5 conda-forge libiconv 1.15 h516909a_1005 conda-forge liblapack 3.8.0 16_openblas conda-forge libllvm9 9.0.1 hc9558a2_0 conda-forge libopenblas 0.3.9 h5ec1e0e_0 conda-forge libpng 1.6.37 hed695b0_0 conda-forge libsodium 1.0.17 h516909a_0 conda-forge libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge libtiff 4.1.0 hc7e4089_4 conda-forge libuuid 1.0.3 h1bed415_2 libwebp 1.1.0 h56121f0_2 conda-forge libwebp-base 1.1.0 2 conda-forge libxcb 1.13 h14c3975_1002 conda-forge libxkbcommon 0.10.0 he1b5a44_0 conda-forge libxml2 2.9.9 hea5a465_1 libxslt 1.1.33 h7d1a2b0_0 llvm-openmp 9.0.1 hc9558a2_2 conda-forge lxml 4.5.0 py36hefd8a0e_0 lz4-c 1.8.3 he1b5a44_1001 conda-forge markupsafe 1.1.1 py36h7b6447c_0 matplotlib 3.1.3 py36_0 matplotlib-base 3.1.3 py36hef1b27d_0 mistune 0.8.4 py36h7b6447c_0 mkl 2020.0 166 conda-forge nbconvert 5.6.1 py36_0 nbformat 5.0.4 py_0 conda-forge ncurses 6.1 hf484d3e_1002 conda-forge nibabel 3.0.2 py_0 conda-forge ninja 1.10.0 hc9558a2_0 conda-forge notebook 6.0.3 py36_0 nspr 4.25 he1b5a44_0 conda-forge nss 3.47 he751ad9_0 conda-forge numpy 1.18.1 py36h94c655d_0 numpy-base 1.18.1 py36h2f8d375_1 olefile 0.46 py_0 conda-forge openssl 1.1.1f h7b6447c_0 packaging 20.1 py_0 conda-forge pandas 1.0.3 pypi_0 pypi pandoc 2.9.2 0 conda-forge pandocfilters 1.4.2 py_1 conda-forge parso 0.6.2 py_0 conda-forge pcre 8.44 he1b5a44_0 conda-forge pexpect 4.8.0 py36_0 pickleshare 0.7.5 py36_0 pillow 6.2.1 py36h34e0f95_0 pip 20.0.2 py_2 conda-forge plotly 4.5.4 py_0 plotly plotly-orca 1.3.0 1 plotly prometheus_client 0.7.1 py_0 conda-forge prompt-toolkit 3.0.4 py_0 conda-forge prompt_toolkit 3.0.4 0 conda-forge psutil 5.7.0 py36h7b6447c_0 pthread-stubs 0.4 h14c3975_1001 conda-forge ptyprocess 0.6.0 py_1001 conda-forge pycparser 2.20 py_0 conda-forge pydicom 1.4.2 py_0 conda-forge pygments 2.6.1 py_0 conda-forge pyopenssl 19.1.0 py_1 conda-forge pyparsing 2.4.6 py_0 conda-forge pyqt 5.9.2 py36h05f1152_2 pyrsistent 0.15.7 py36h7b6447c_0 pysocks 1.7.1 py36_0 python 3.6.10 h0371630_0 python-dateutil 2.8.1 py_0 conda-forge pytorch 1.1.0 py3.6_cuda9.0.176_cudnn7.5.1_0 pytorch pytz 2019.3 py_0 conda-forge pyyaml 3.12 py36hafb9ca4_1 pyzmq 18.1.1 py36he6710b0_0 qt 5.9.7 h5867ecd_1 readline 7.0 h7b6447c_5 requests 2.23.0 pyh8c360ce_2 conda-forge retrying 1.3.3 py_2 conda-forge s3transfer 0.3.3 pypi_0 pypi scikit-learn 0.22.2.post1 pypi_0 pypi scipy 1.4.1 py36habc2bb6_0 send2trash 1.5.0 py_0 conda-forge setuptools 46.0.0 py36_0 sip 4.19.8 py36hf484d3e_0 six 1.14.0 py_1 conda-forge sklearn 0.0 pypi_0 pypi snowballstemmer 2.0.0 py_0 conda-forge sphinx 2.4.4 py_0 conda-forge sphinxcontrib-applehelp 1.0.2 py_0 conda-forge sphinxcontrib-devhelp 1.0.2 py_0 conda-forge sphinxcontrib-htmlhelp 1.0.3 py_0 conda-forge sphinxcontrib-jsmath 1.0.1 py_0 conda-forge sphinxcontrib-qthelp 1.0.3 py_0 conda-forge sphinxcontrib-serializinghtml 1.1.4 py_0 conda-forge sqlite 3.31.1 h7b6447c_0 terminado 0.8.3 py36_0 testpath 0.4.4 py_0 conda-forge tk 8.6.10 hed695b0_0 conda-forge torchvision 0.3.0 py36_cu9.0.176_1 pytorch tornado 6.0.4 py36h7b6447c_1 traitlets 4.3.3 py36_0 urllib3 1.25.8 py36_0 wcwidth 0.1.8 py_0 conda-forge webencodings 0.5.1 py_1 conda-forge wheel 0.34.2 py_1 conda-forge widgetsnbextension 3.5.1 py36_0 xorg-libxau 1.0.9 h14c3975_0 conda-forge xorg-libxdmcp 1.1.3 h516909a_0 conda-forge xz 5.2.4 h14c3975_1001 conda-forge yaml 0.2.2 h516909a_1 conda-forge zeromq 4.3.2 he1b5a44_2 conda-forge zipp 3.1.0 py_0 conda-forge zlib 1.2.11 h516909a_1006 conda-forge zstd 1.4.4 h3b9ef0a_1 conda-forge`