YeongJunKim / issue

footprint of fixed error & some tips.
0 stars 0 forks source link

linux 18.04 tensorflow #8

Open YeongJunKim opened 4 years ago

YeongJunKim commented 4 years ago
conda install -y python==3.6.9
tensorflow-gpu==1.14
keras==2.3.0
CUDA=10.0

기본가이드

텐서플로우 홈페이지 gpu 설치

CUDA downgrade

텐서플로우 gpu 사용 여부 확인

conda PermissionError 13

Blacklist the driver

sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"

Confirm the content of the new modprobe config file

cat /etc/modprobe.d/blacklist-nvidia-nouveau.conf

blacklist nouveau
options nouveau modeset=0

Update kernel initramfs

sudo update-initramfs -u

Reboot

sudo reboot

Remove Any Other NVIDIA Driver

Add the graphics driver PPA
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update

Purge the driver

sudo apt-get purge nvidia* && sudo apt-get autoremove && sudo apt-get autoclean && sudo rm -rf /usr/local/cuda*

Reboot

sudo reboot

Add Nvidia package repositories

# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

Install Nvidia driver

sudo apt-get install --no-install-recommends nvidia-driver-418

Reboot

sudo reboot

Install Runtime & Development Libraries (cuDNN)

sudo apt-get install --no-install-recommends cuda-10-0 libcudnn7=7.6.2.24-1+cuda10.0 libcudnn7-dev=7.6.2.24-1+cuda10.0

Install TensorRT

sudo apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.0 libnvinfer-dev=5.1.5-1+cuda10.0

check nvidia

# 설치된 nvidia-smi
nvidia-smi
# nvcc로 cuda의 버전체크
nvcc --version

Install Anaconda package

click here

If conda has permission error 13

sudo chown -R user anaconda3

conda python3

conda install -y python==3.6.9

vertural env

conda create -n tf-gpu3.6.9 python=3.6.9
source activate  tf-gpu3.6.9

tensorflow & keras install

conda install tensorflow-gpu==1.14
conda install keras==2.3

# 아래 세개는 자동 설치 되더라.
conda install cudatoolkit==9.0
conda install cudatoolkit==7.1.2
conda install h5py
YeongJunKim commented 4 years ago

Check tensorflow and keras ... are using gpu()

1. TensorFlow

import tensorflow as tf

print(tf.__version__)
# 1.14.0

tf.test.is_gpu_available(
    cuda_only=False,
    min_cuda_compute_capability=None
)
# True

2. Keras

from keras import backend as K

print(keras.__version__)
# 2.2.4
K.tensorflow_backend._get_available_gpus()
# ['/job:localhost/replica:0/task:0/device:GPU:0']

3. PyTorch

import torch

torch.cuda.device_count()
# 1

torch.cuda.get_device_name(0)
# GeForce RTX 2080 Ti

torch.cuda.is_available()
# True