Closed PeterAJansen closed 4 years ago
Hi: You'll need to setup GPU support for TensorFlow 2.
To verify the GPU is being used, you can type nvidia-smi
at command-line during training.
GPU will be used automatically - just make sure you have this at top of your script/notebook:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID";
os.environ["CUDA_VISIBLE_DEVICES"]="0"
Also, DistilBERT trains in half the time as BERT and has nearly the same performance.
Hi Amaiya, Thanks for your help, and apologies for the miscommunication -- I already have CUDA (10.1) and NVIDIA drivers (435.21) setup, am reasonably comfortable with GPU computing, and can (for example) run the examples in the huggingface transformers library on the GPU without issue. The ktrain in requirements.txt also installs the TensorFlow 2 dependency, here's that subset of the packages installed with 'conda list':
tensorboard 2.1.1 pypi_0 pypi
tensorflow 2.1.0 pypi_0 pypi
tensorflow-datasets 3.2.0 pypi_0 pypi
tensorflow-estimator 2.1.0 pypi_0 pypi
tensorflow-metadata 0.22.2 pypi_0 pypi
I do already have the environment variables set at the top of the file, but this unfortunately doesn't seem to have any effect on the code's choice of running on the CPU instead of the GPU:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID";
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import ktrain
from ktrain import text as txt
# load data
(x_train, y_train), (x_test, y_test), preproc = txt.texts_from_folder('data/aclImdb', maxlen=500,
preprocess_mode='bert',
train_test_names=['train', 'test'],
classes=['pos', 'neg'])
# load model
model = txt.text_classifier('bert', (x_train, y_train), preproc=preproc)
# wrap model and data in ktrain.Learner object
learner = ktrain.get_learner(model,
train_data=(x_train, y_train),
val_data=(x_test, y_test),
batch_size=6)
# find good learning rate
learner.lr_find() # briefly simulate training to find good learning rate
learner.lr_plot() # visually identify best learning rate
# train using 1cycle learning rate schedule for 3 epochs
learner.fit_onecycle(2e-5, 3)
Here is the output:
(act2) peter@neutronium:~/github/act2$ python test2.py
detected encoding: utf-8
preprocessing train...
language: en
done. 1/1 :
Is Multi-Label? False
preprocessing test...
language: en
done. 1/1 :
Is Multi-Label? False
maxlen is 500
done.
simulating training for different learning rates... this may take a few moments...
Train on 25000 samples
Epoch 1/1024
24/25000 [..............................] - ETA: 12:26:04 - loss: 0.9763 - accuracy: 0.4167
Here is the nvidia-smi output at this time:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN RTX Off | 00000000:0D:00.0 On | N/A |
| 41% 49C P2 67W / 280W | 1341MiB / 24215MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1381 C python 161MiB |
| 0 1432 G /usr/lib/xorg/Xorg 372MiB |
| 0 1645 G /usr/lib/vmware/bin/vmware-vmx 53MiB |
| 0 3402 G /usr/bin/krunner 25MiB |
| 0 3404 G /usr/bin/plasmashell 102MiB |
| 0 11049 G /usr/bin/obs 133MiB |
| 0 16934 G /usr/bin/vlc 8MiB |
| 0 17841 G ...uest-channel-token=13999864370167093987 63MiB |
| 0 31521 G ...AAAAAAAAAAAAAAgAAAAAAAAA --shared-files 415MiB |
+-----------------------------------------------------------------------------+
Here is top, showing the process is instead using the CPUs:
top - 15:32:42 up 2 days, 5:36, 10 users, load average: 20.13, 9.27, 3.69
Tasks: 565 total, 1 running, 392 sleeping, 0 stopped, 1 zombie
%Cpu(s): 70.9 us, 8.5 sy, 0.0 ni, 20.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13185091+total, 77732896 free, 17963680 used, 36154336 buff/cache
KiB Swap: 2097148 total, 2091260 free, 5888 used. 11183318+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1381 peter 20 0 26.852g 0.010t 347496 S 2478 8.0 50:59.66 python
Some other quick tests:
1) Setting the environment variables manually doesn't help:
peter@neutronium:~/github/act2$ printenv | grep CUDA
CUDA_DEVICE_ORDER=PCI_BUS_ID
CUDA_VISIBLE_DEVICES=0
2) The above issues are on Python 3.7. Same issue with a fresh Conda Python 3.8 install with only the bare essentials (ktrain and it's immediate dependencies), which uses tensorflow 2.2.0 .
Thanks for the extra information. I do see that this is definitely using the CPU. When you ran the transformers
examples, are you sure you were running the TensorFlow examples and not the PyTorch examples? I've verified everything is working correctly on a local GPU as well as Google Colab, so it still seems like a TF/CUDA issue to me.
One thing you can try is to re-run your ktrain BERT example but add this to the top of your script:
os.environ["SUPPRESS_TF_WARNINGS"]="0"
ktrain suppresses a lot of TensorFlow warnings by default. This will allow you to see them. Are there are any warnings about CUDA?
Also, when you run this MNIST example below, does nvidia-smi
show that it is using the GPU? Are there are any errors or warnings related to CUDA or GPU?
from __future__ import print_function
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K
batch_size = 128
num_classes = 10
epochs = 12
img_rows, img_cols = 28, 28
(x_train, y_train), (x_test, y_test) = mnist.load_data()
if K.image_data_format() == 'channels_first':
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test))
print(model.evaluate(x_test, y_test, verbose=0))
See also this Medium article: TensorFlow 2.1 doesn’t recognize my GPU,though Cuda 10.1. (with Solution)
I added some code to list the available GPUs, and it wasn't able to see them -- so I re-installed the drivers and it looks to be working now. This of course entirely on my end, and not an issue with ktrain. Thanks again for your help!
Hi there,
I'm new to ktrain, and the example models that I'm running appear to be running (slowly) on the CPU instead of GPU. I'm using Ubuntu 18.04, and a Titan RTX with NVIDIA driver version 435.21.
1) I've tried two BERT demos: the huggingface demo, and the aclImdb demo. Both seem to have this issue.
2) Some Googling suggested adding this line to the top of the files, but it doesn't seem to have had an effect: import os os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"; os.environ["CUDA_VISIBLE_DEVICES"]="0"
3) Here is my requirements.txt file (hacked from another example): torch==1.4 torchtext==0.5 transformers==2.11.0 spacy==2.2.4 matplotlib gensim sklearn scikit-learn==0.21.3 scipy==1.4.1 ktrain
thanks, Peter