Open guchchcug opened 4 years ago
Hi @guchchcug, I've used multi-cpus on Jetstream, transferred data to Bridges, and run deep learning on gpu on Bridges in Singularity container. I could come during lunch or an evening to go over it.
do you have a GPU XSEDE resource to play on?
tensorflow1.12.0-py3-cuda9.0-ubuntu16.04.simg
) with the Singularity recipe:
Bootstrap: docker
From: nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
%post
apt-get -y update apt-get -y upgrade apt-get install -y wget git vim python3 python3-pip python3-tk ln -s /usr/bin/python3 /usr/bin/python
CONDA_INSTALL_PATH="/usr/local/anaconda3-4.2.0" wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh chmod +x Anaconda3-4.2.0-Linux-x86_64.sh ./Anaconda3-4.2.0-Linux-x86_64.sh -b -p $CONDA_INSTALL_PATH
export LC_ALL=C
pip3 install tensorflow-gpu==1.12.0 pip3 install tensorflow-probability
pip3 install scipy pip3 install matplotlib==2.2.4 pip3 install sklearn pip3 install pandas pip3 install Pillow pip3 install livelossplot pip3 install hyperas pip3 install GPy pip3 install GPyOpt pip3 install blinker pip3 install psutil pip3 install spacy
pip3 install keras
%runscript exec "$@"
2. Transferred Singularity image to Bridges.
3. Start interactive GPU-AI job https://www.psc.edu/bridges/user-guide/running-jobs#ai
interact -p GPU-AI -A mc5phjp --gres=gpu:volta16:1
where `mc5phjp` is my project ID
4. Start environment
module load cuda/9.0 module load singularity/3.0.0 singularity shell --nv -B $SCRATCH tensorflow1.9.0-py3-cuda9.0-ubuntu16.04.simg
5. Test deep learning
python my_deep_learning_code.py
Or alternatively, use similar steps to submit a job with slurm.
Hi Ariella,
Thank you for the information! That’s great if you can come for help!
I have GPU XSEDED resources on both comet and bridge. I’ll go through the procedures myself first and hope to go over it with you when you come! Thank you! :)
Chen
On Jul 24, 2019, at 10:15 AM, Ariella Gladstein notifications@github.com<mailto:notifications@github.com> wrote:
Hi @guchchcughttps://github.com/guchchcug, I've used multi-cpus on Jetstream, transferred data to Bridges, and run deep learning on gpu on Bridges in Singularity container. I could come during lunch or an evening to go over it.
do you have a GPU XSEDE resource to play on?
Bootstrap: docker From: nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
%post
apt-get -y update apt-get -y upgrade apt-get install -y wget git vim python3 python3-pip python3-tk ln -s /usr/bin/python3 /usr/bin/python
CONDA_INSTALL_PATH="/usr/local/anaconda3-4.2.0" wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh chmod +x Anaconda3-4.2.0-Linux-x86_64.sh ./Anaconda3-4.2.0-Linux-x86_64.sh -b -p $CONDA_INSTALL_PATH
export LC_ALL=C
pip3 install tensorflow-gpu==1.12.0 pip3 install tensorflow-probability
pip3 install scipy pip3 install matplotlib==2.2.4 pip3 install sklearn pip3 install pandas pip3 install Pillow pip3 install livelossplot pip3 install hyperas pip3 install GPy pip3 install GPyOpt pip3 install blinker pip3 install psutil pip3 install spacy
pip3 install keras
%runscript exec "$@"
Transferred Singularity image to Bridges.
Start interactive GPU-AI job https://www.psc.edu/bridges/user-guide/running-jobs#ai
interact -p GPU-AI -A mc5phjp --gres=gpu:volta16:1
where mc5phjp is my project ID
module load cuda/9.0 module load singularity/3.0.0 singularity shell --nv -B $SCRATCH tensorflow1.9.0-py3-cuda9.0-ubuntu16.04.simg
python my_deep_learning_code.py
Or alternatively, use similar steps to submit a job with slurm.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=ACUUFWK4GR5DRMH35SQ6CTTQBBPX5A5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2WPC3A#issuecomment-514650476, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACUUFWIIMXMIJ6A4J5CNUO3QBBPX5ANCNFSM4IF7KMQA.
Hi Ariella,
I got the following error when I run the singularity file under my bridge account.
[guchch@gpu051 ~]$ singularity shell --nv -B $SCRATCH my_first_tensorflow1.12.0-py3-cuda9.0-ubuntu16.04.simg
FATAL: image format not recognized ERROR : Child exit with status 255
Chen ps. The my_first_tensorflow1.12.0-py3-cuda9.0-ubuntu16.04.simg file is as follows: Bootstrap: docker From: nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
%post
apt-get -y update apt-get -y upgrade apt-get install -y wget git vim python3 python3-pip python3-tk ln -s /usr/bin/python3 /usr/bin/python
CONDA_INSTALL_PATH="/usr/local/anaconda3-4.2.0" wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh chmod +x Anaconda3-4.2.0-Linux-x86_64.sh ./Anaconda3-4.2.0-Linux-x86_64.sh -b -p $CONDA_INSTALL_PATH
export LC_ALL=C
pip3 install tensorflow-gpu==1.12.0 pip3 install tensorflow-probability
pip3 install scipy pip3 install matplotlib==2.2.4 pip3 install sklearn pip3 install pandas pip3 install Pillow pip3 install livelossplot pip3 install hyperas pip3 install GPy pip3 install GPyOpt pip3 install blinker pip3 install psutil pip3 install spacy
pip3 install keras
%runscript exec "$@"
On Jul 24, 2019, at 10:40 AM, Chen Gu guchch@mit.edu<mailto:guchch@mit.edu> wrote:
Hi Ariella,
Thank you for the information! That’s great if you can come for help!
I have GPU XSEDED resources on both comet and bridge. I’ll go through the procedures myself first and hope to go over it with you when you come! Thank you! :)
Chen
On Jul 24, 2019, at 10:15 AM, Ariella Gladstein notifications@github.com<mailto:notifications@github.com> wrote:
Hi @guchchcughttps://github.com/guchchcug, I've used multi-cpus on Jetstream, transferred data to Bridges, and run deep learning on gpu on Bridges in Singularity container. I could come during lunch or an evening to go over it.
do you have a GPU XSEDE resource to play on?
Bootstrap: docker From: nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
%post
apt-get -y update apt-get -y upgrade apt-get install -y wget git vim python3 python3-pip python3-tk ln -s /usr/bin/python3 /usr/bin/python
CONDA_INSTALL_PATH="/usr/local/anaconda3-4.2.0" wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh chmod +x Anaconda3-4.2.0-Linux-x86_64.sh ./Anaconda3-4.2.0-Linux-x86_64.sh -b -p $CONDA_INSTALL_PATH
export LC_ALL=C
pip3 install tensorflow-gpu==1.12.0 pip3 install tensorflow-probability
pip3 install scipy pip3 install matplotlib==2.2.4 pip3 install sklearn pip3 install pandas pip3 install Pillow pip3 install livelossplot pip3 install hyperas pip3 install GPy pip3 install GPyOpt pip3 install blinker pip3 install psutil pip3 install spacy
pip3 install keras
%runscript exec "$@"
Transferred Singularity image to Bridges.
Start interactive GPU-AI job https://www.psc.edu/bridges/user-guide/running-jobs#ai
interact -p GPU-AI -A mc5phjp --gres=gpu:volta16:1
where mc5phjp is my project ID
module load cuda/9.0 module load singularity/3.0.0 singularity shell --nv -B $SCRATCH tensorflow1.9.0-py3-cuda9.0-ubuntu16.04.simg
python my_deep_learning_code.py
Or alternatively, use similar steps to submit a job with slurm.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=ACUUFWK4GR5DRMH35SQ6CTTQBBPX5A5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2WPC3A#issuecomment-514650476, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACUUFWIIMXMIJ6A4J5CNUO3QBBPX5ANCNFSM4IF7KMQA.
hmmm, I just double checked on Bridges and it worked for me. Did you load the singularity module? What version of singularity did you build the image with? Did you test the singularity image where ever you originally built it? Also, I can just pass you my image...
Hi Ariella,
I used what you suggested:
module load cuda/9.0 module load singularity/3.0.0
Chen Gu, Ph.D. Postdoctoral Associate Earth Resources Laboratory Department of Earth, Atmospheric and Planetary Sciences Massachusetts Institute of Technology 77 Mass Ave., 54-617 Cambridge, MA 02139 Telephone: 617-253-7278 Cell: 617-416-6058 Email: guchch@mit.edumailto:guchch@mit.edu Website: http://www.chenguatmit.com
On Jul 24, 2019, at 4:24 PM, Ariella Gladstein notifications@github.com<mailto:notifications@github.com> wrote:
hmmm, I just double checked on Bridges and it worked for me. Did you load the singularity module? What version of singularity did you build the image with? Did you test the singularity image where ever you originally built it? Also, I can just pass you my image...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=ACUUFWNA3C2RJOVBB7Q7GDLQBC3ATA5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2XQHIA#issuecomment-514786208, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACUUFWOSIKWFZDK6CPO4RM3QBC3ATANCNFSM4IF7KMQA.
Did you test your image in the environment you originally created it?
I’m not sure if I created a image. I just put all the following text in a file named "my_first_tensorflow1.12.0-py3-cuda9.0-ubuntu16.04.simg”.
Bootstrap: docker From: nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
%post
apt-get -y update apt-get -y upgrade apt-get install -y wget git vim python3 python3-pip python3-tk ln -s /usr/bin/python3 /usr/bin/python
CONDA_INSTALL_PATH="/usr/local/anaconda3-4.2.0" wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh chmod +x Anaconda3-4.2.0-Linux-x86_64.sh ./Anaconda3-4.2.0-Linux-x86_64.sh -b -p $CONDA_INSTALL_PATH
export LC_ALL=C
pip3 install tensorflow-gpu==1.12.0 pip3 install tensorflow-probability
pip3 install scipy pip3 install matplotlib==2.2.4 pip3 install sklearn pip3 install pandas pip3 install Pillow pip3 install livelossplot pip3 install hyperas pip3 install GPy pip3 install GPyOpt pip3 install blinker pip3 install psutil pip3 install spacy
pip3 install keras
%runscript exec "$@"
Chen
On Jul 24, 2019, at 4:39 PM, Ariella Gladstein notifications@github.com<mailto:notifications@github.com> wrote:
Did you test your image in the environment you originally created it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=ACUUFWP4OFB7WSZVYCTTNKLQBC4XFA5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2XRNQA#issuecomment-514791104, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACUUFWJ7BO3S62GAC2HWVTLQBC4XFANCNFSM4IF7KMQA.
Ah. Yes, that’s why it doesn’t think it’s a singularity image. You need to build the singularity image using that recipe, just like we built a docker image from a dockerfile. I’m not a computer right now, so I can’t send you commands to build it. Ask somebody how to build singularity image on Jetstream, then copy your image to bridges. Or I can send you my image just for you to try it.
On Wed, Jul 24, 2019 at 4:47 PM guchchcug notifications@github.com wrote:
I’m not sure if I created a image. I just put all the following text in a file named "my_first_tensorflow1.12.0-py3-cuda9.0-ubuntu16.04.simg”.
Bootstrap: docker From: nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
%post
Updating and getting required packages
apt-get -y update apt-get -y upgrade apt-get install -y wget git vim python3 python3-pip python3-tk ln -s /usr/bin/python3 /usr/bin/python
Download and install Anaconda
CONDA_INSTALL_PATH="/usr/local/anaconda3-4.2.0" wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh chmod +x Anaconda3-4.2.0-Linux-x86_64.sh ./Anaconda3-4.2.0-Linux-x86_64.sh -b -p $CONDA_INSTALL_PATH
export LC_ALL=C
Install Tensorflow
pip3 install tensorflow-gpu==1.12.0 pip3 install tensorflow-probability
Install other python modules
pip3 install scipy pip3 install matplotlib==2.2.4 pip3 install sklearn pip3 install pandas pip3 install Pillow pip3 install livelossplot pip3 install hyperas pip3 install GPy pip3 install GPyOpt pip3 install blinker pip3 install psutil pip3 install spacy
Install Keras
pip3 install keras
Run command defined in command line
%runscript exec "$@"
Chen
On Jul 24, 2019, at 4:39 PM, Ariella Gladstein <notifications@github.com mailto:notifications@github.com> wrote:
Did you test your image in the environment you originally created it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=ACUUFWP4OFB7WSZVYCTTNKLQBC4XFA5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2XRNQA#issuecomment-514791104>, or mute the thread< https://github.com/notifications/unsubscribe-auth/ACUUFWJ7BO3S62GAC2HWVTLQBC4XFANCNFSM4IF7KMQA>.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=AGLCJ7SNUCXWH75EZH2VTCTQBC5WJA5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2XSDSA#issuecomment-514793928, or mute the thread https://github.com/notifications/unsubscribe-auth/AGLCJ7V22DILXZVQRKVLVI3QBC5WJANCNFSM4IF7KMQA .
Hi Ariella,
Thank you! I’ll build my singularity. Could you also send me your image so that I can try that?
Chen
On Jul 24, 2019, at 4:58 PM, Ariella Gladstein notifications@github.com<mailto:notifications@github.com> wrote:
Ah. Yes, that’s why it doesn’t think it’s a singularity image. You need to build the singularity image using that recipe, just like we built a docker image from a dockerfile. I’m not a computer right now, so I can’t send you commands to build it. Ask somebody how to build singularity image on Jetstream, then copy your image to bridges. Or I can send you my image just for you to try it.
On Wed, Jul 24, 2019 at 4:47 PM guchchcug notifications@github.com<mailto:notifications@github.com> wrote:
I’m not sure if I created a image. I just put all the following text in a file named "my_first_tensorflow1.12.0-py3-cuda9.0-ubuntu16.04.simg”.
Bootstrap: docker From: nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
%post
Updating and getting required packages
apt-get -y update apt-get -y upgrade apt-get install -y wget git vim python3 python3-pip python3-tk ln -s /usr/bin/python3 /usr/bin/python
Download and install Anaconda
CONDA_INSTALL_PATH="/usr/local/anaconda3-4.2.0" wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh chmod +x Anaconda3-4.2.0-Linux-x86_64.sh ./Anaconda3-4.2.0-Linux-x86_64.sh -b -p $CONDA_INSTALL_PATH
export LC_ALL=C
Install Tensorflow
pip3 install tensorflow-gpu==1.12.0 pip3 install tensorflow-probability
Install other python modules
pip3 install scipy pip3 install matplotlib==2.2.4 pip3 install sklearn pip3 install pandas pip3 install Pillow pip3 install livelossplot pip3 install hyperas pip3 install GPy pip3 install GPyOpt pip3 install blinker pip3 install psutil pip3 install spacy
Install Keras
pip3 install keras
Run command defined in command line
%runscript exec "$@"
Chen
On Jul 24, 2019, at 4:39 PM, Ariella Gladstein notifications@github.com<mailto:notifications@github.com mailto:notifications@github.com> wrote:
Did you test your image in the environment you originally created it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=ACUUFWP4OFB7WSZVYCTTNKLQBC4XFA5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2XRNQA#issuecomment-514791104>, or mute the thread< https://github.com/notifications/unsubscribe-auth/ACUUFWJ7BO3S62GAC2HWVTLQBC4XFANCNFSM4IF7KMQA>.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=AGLCJ7SNUCXWH75EZH2VTCTQBC5WJA5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2XSDSA#issuecomment-514793928, or mute the thread https://github.com/notifications/unsubscribe-auth/AGLCJ7V22DILXZVQRKVLVI3QBC5WJANCNFSM4IF7KMQA .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=ACUUFWMGCOEWRRQ7SHMIHW3QBC663A5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2XS7LQ#issuecomment-514797486, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACUUFWOEY5XIGIHRXSPWHP3QBC663ANCNFSM4IF7KMQA.
Hi Ariella,
I found the procedures for building my singularity image: https://github.com/guchchcug/container_camp_workshop_2019/blob/master/singularity/singularityadvanced.rst
Chen Gu, Ph.D. Postdoctoral Associate Earth Resources Laboratory Department of Earth, Atmospheric and Planetary Sciences Massachusetts Institute of Technology 77 Mass Ave., 54-617 Cambridge, MA 02139 Telephone: 617-253-7278 Cell: 617-416-6058 Email: guchch@mit.edumailto:guchch@mit.edu Website: http://www.chenguatmit.com
On Jul 24, 2019, at 4:59 PM, Chen Gu guchch@mit.edu<mailto:guchch@mit.edu> wrote:
Hi Ariella,
Thank you! I’ll build my singularity. Could you also send me your image so that I can try that?
Chen
On Jul 24, 2019, at 4:58 PM, Ariella Gladstein notifications@github.com<mailto:notifications@github.com> wrote:
Ah. Yes, that’s why it doesn’t think it’s a singularity image. You need to build the singularity image using that recipe, just like we built a docker image from a dockerfile. I’m not a computer right now, so I can’t send you commands to build it. Ask somebody how to build singularity image on Jetstream, then copy your image to bridges. Or I can send you my image just for you to try it.
On Wed, Jul 24, 2019 at 4:47 PM guchchcug notifications@github.com<mailto:notifications@github.com> wrote:
I’m not sure if I created a image. I just put all the following text in a file named "my_first_tensorflow1.12.0-py3-cuda9.0-ubuntu16.04.simg”.
Bootstrap: docker From: nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
%post
Updating and getting required packages
apt-get -y update apt-get -y upgrade apt-get install -y wget git vim python3 python3-pip python3-tk ln -s /usr/bin/python3 /usr/bin/python
Download and install Anaconda
CONDA_INSTALL_PATH="/usr/local/anaconda3-4.2.0" wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh chmod +x Anaconda3-4.2.0-Linux-x86_64.sh ./Anaconda3-4.2.0-Linux-x86_64.sh -b -p $CONDA_INSTALL_PATH
export LC_ALL=C
Install Tensorflow
pip3 install tensorflow-gpu==1.12.0 pip3 install tensorflow-probability
Install other python modules
pip3 install scipy pip3 install matplotlib==2.2.4 pip3 install sklearn pip3 install pandas pip3 install Pillow pip3 install livelossplot pip3 install hyperas pip3 install GPy pip3 install GPyOpt pip3 install blinker pip3 install psutil pip3 install spacy
Install Keras
pip3 install keras
Run command defined in command line
%runscript exec "$@"
Chen
On Jul 24, 2019, at 4:39 PM, Ariella Gladstein notifications@github.com<mailto:notifications@github.com mailto:notifications@github.com> wrote:
Did you test your image in the environment you originally created it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=ACUUFWP4OFB7WSZVYCTTNKLQBC4XFA5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2XRNQA#issuecomment-514791104>, or mute the thread< https://github.com/notifications/unsubscribe-auth/ACUUFWJ7BO3S62GAC2HWVTLQBC4XFANCNFSM4IF7KMQA>.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=AGLCJ7SNUCXWH75EZH2VTCTQBC5WJA5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2XSDSA#issuecomment-514793928, or mute the thread https://github.com/notifications/unsubscribe-auth/AGLCJ7V22DILXZVQRKVLVI3QBC5WJANCNFSM4IF7KMQA .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=ACUUFWMGCOEWRRQ7SHMIHW3QBC663A5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2XS7LQ#issuecomment-514797486, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACUUFWOEY5XIGIHRXSPWHP3QBC663ANCNFSM4IF7KMQA.
You can also try a singularity image from singularity hub. I just tried out a random image tagged tensorflow on singularity hub (https://singularity-hub.org/search)
on the bridges login node,
module load singularity/3.0.0
pull a tensorflow singularity image,
singularity pull shub://belledon/tensorflow-keras
start an interactive job
interact -p GPU-AI -A mc5phjp --gres=gpu:volta16:1
load necessary modules
module load singularity/3.0.0
module load cuda/9.0
Enter singularity image
singularity shell --nv -B $SCRATCH tensorflow-keras_latest.sif
Test if keras imports
Singularity tensorflow-keras_latest.sif:~> python
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import keras
Using TensorFlow backend.
I did not verify that it uses the gpu, but I suspect it will.
Hi Ariella,
I still get the following errors:
Singularity tensorflow-keras_latest.sif:/pylon5/ea5phhp/guchch/my_singularity> python MNIST-scratch.py Using TensorFlow backend. Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz Traceback (most recent call last): File "/usr/lib/python3.5/urllib/request.py", line 1254, in do_open h.request(req.get_method(), req.selector, req.data, headers) File "/usr/lib/python3.5/http/client.py", line 1106, in request self._send_request(method, url, body, headers) File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request self.endheaders(body) File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders self._send_output(message_body) File "/usr/lib/python3.5/http/client.py", line 934, in _send_output self.send(msg) File "/usr/lib/python3.5/http/client.py", line 877, in send self.connect() File "/usr/lib/python3.5/http/client.py", line 1252, in connect super().connect() File "/usr/lib/python3.5/http/client.py", line 849, in connect (self.host,self.port), self.timeout, self.source_address) File "/usr/lib/python3.5/socket.py", line 711, in create_connection raise err File "/usr/lib/python3.5/socket.py", line 702, in create_connection sock.connect(sa) OSError: [Errno 101] Network is unreachable
Chen Gu, Ph.D. Postdoctoral Associate Earth Resources Laboratory Department of Earth, Atmospheric and Planetary Sciences Massachusetts Institute of Technology 77 Mass Ave., 54-617 Cambridge, MA 02139 Telephone: 617-253-7278 Cell: 617-416-6058 Email: guchch@mit.edumailto:guchch@mit.edu Website: http://www.chenguatmit.com
On Jul 25, 2019, at 11:26 PM, Ariella Gladstein notifications@github.com<mailto:notifications@github.com> wrote:
python
Did you try just importing tensorflow or keras from python in the container? Did that work?
Hi Ariella,
That works. But I got the errors when I directly run my python codes: python MNIST-scratch.py.
Thank you!
Chen
On Jul 26, 2019, at 9:09 AM, Ariella Gladstein notifications@github.com<mailto:notifications@github.com> wrote:
Did you try just importing tensorflow or keras from python in the container? Did that work?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=ACUUFWJ7BTNW2IY7BZU5UDTQBLZSBA5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD24RBRA#issuecomment-515444932, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACUUFWIBJ3NACIA5B52GX73QBLZSBANCNFSM4IF7KMQA.
ah okay! In that case, the container works for what I thought you wanted. Looks like it's not able to connect to the amazon s3 bucket to get the mnist data. I'm not sure about why that's not working - would need to look at your code. Probably @julianpistorius has an idea.
Figured it out. Looks like they block outbound network access from those nodes. Pre-downloading the mnist.npz
into the ~/.keras/datasets
directory worked.
Hi Julian and Ariella,
Thank you very much for helping me! :)
Chen
On Jul 26, 2019, at 1:05 PM, Julian Pistorius notifications@github.com<mailto:notifications@github.com> wrote:
Figured it out. Looks like they block outbound network access from those nodes. Pre-downloading the mnist.npz into the ~/.keras/datasets directory worked.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cyber-carpentry/breakfast_carpentry/issues/10?email_source=notifications&email_token=ACUUFWOOIMWPSXH67GD2JQTQBMVGBA5CNFSM4IF7KMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD25FSKI#issuecomment-515529001, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACUUFWNDO5FY2FBFUICZY23QBMVGBANCNFSM4IF7KMQA.
I'd like to learn using Singularity to run deep learning codes with GPU on XSEDE, and with multi-CPUs on jet streams. I have at least MNIST codes ready for that testing.