Open dmckinno opened 3 years ago
I’m seeing the same thing - gpu use is also quite low, with observable gaps. I’m not sure what they’re going for with this.
Thank you very much for reporting this. Could you provide a reproducible test case, so we know exactly what you are running and can investigate locally?
There is an optional mlcompute.set_mlc_device(device_name=’any') API for ML Compute device selection. The default value for device_name is 'any’, which means ML Compute will select the best available device on your system, including multiple GPUs on multi-GPU configurations. Could you try running with ‘cpu’ and ‘gpu’ and let us know what you see? Thank you!
Sure. Code below.
When I set mlcompute.set_mlc_device(device_name=’cpu'), the wall time increased to almost 17 minutes (see below). ML Compute is clearly accelerating something on the GPU, but it is doing it much less efficiently than PlaidML.
You can see that is is using the Radeon rather than the Intel GPU from the attached Activity Monitor screenshots (device_name=’gpu' above and device_name=’cpu' below).
Epoch 1/5
59/59 [==============================] - 203s 3s/step - loss: 1.2530 - accuracy: 0.6592
Epoch 2/5
59/59 [==============================] - 197s 3s/step - loss: 0.0908 - accuracy: 0.9719
Epoch 3/5
59/59 [==============================] - 205s 3s/step - loss: 0.0530 - accuracy: 0.9833
Epoch 4/5
59/59 [==============================] - 207s 4s/step - loss: 0.0306 - accuracy: 0.9903
Epoch 5/5
59/59 [==============================] - 205s 3s/step - loss: 0.0247 - accuracy: 0.9921
CPU times: user 2h 6min 46s, sys: 38.4 s, total: 2h 7min 25s
Wall time: 16min 58s
PlaidML
import numpy as np
import os
os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"
import keras
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=(28, 28, 1)),
keras.layers.Dense(4096,activation='relu'),
keras.layers.Dense(4096,activation='relu'),
keras.layers.Dense(4096,activation='relu'),
keras.layers.Dense(4096,activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'],)
model.fit(np.expand_dims(x_train,3), y_train, epochs=5, batch_size=1024)
ML Compute
import tensorflow as tf
from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu') # Available options are 'cpu', 'gpu', and ‘any'.
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
tf.keras.layers.Dense(4096,activation='relu'),
tf.keras.layers.Dense(4096,activation='relu'),
tf.keras.layers.Dense(4096,activation='relu'),
tf.keras.layers.Dense(4096,activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'],)
model.fit(x_train, y_train, epochs=5, batch_size=1024)
@anna-tikhonova, any resolution here? Would love to begin porting some code from PlaidML to ML Compute.
I have the same issue , this is a script I ran on ml compute
import tensorflow as tf tf.config.run_functions_eagerly(False) from tensorflow.python.framework.ops import disable_eager_execution disable_eager_execution() from tensorflow.python.compiler.mlcompute import mlcompute mlcompute.set_mlc_device(device_name='gpu')
from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense,Dropout,Activation,Flatten,Conv2D, MaxPooling2D from tensorflow.keras.callbacks import TensorBoard import pickle import warnings import time
warnings.filterwarnings("ignore") NAME="Cats-vs-Dogs-64*2-{}".format(int(time.time())) tensorboard= TensorBoard(log_dir=f'logs/{NAME}')
X=pickle.load(open("X.pickle","rb")) y=pickle.load(open("y.pickle",'rb')) X=X/255.0
model=Sequential()
model.add(Conv2D(64,(3,3),input_shape=X.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64,(3,3))) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())#convolution is 2d whereas dense layer needs 1d so flatten,probably
model.add(Dense(64)) model.add(Dense(1)) model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics= ['accuracy']) model.fit(X,y,batch_size=50,validation_split=0.1,epochs=10, callbacks=[tensorboard])
output Train on 22451 samples, validate on 2495 samples Epoch 1/10 22451/22451 [==============================] - 84s 4ms/sample - loss: 0.6625 - accuracy: 0.6123 - val_loss: 0.6311 - val_accuracy: 0.6653
while on plaidML ETA was 64s sample. It seems ML compute doesn't utilise the GPU to its complete extent. Does ML compute have an option like there is on plaidML to select for specific GPU while setup? @anna-tikhonova
Training 5 epochs on the network below on the tf.keras.datasets.mnist dataset takes ~5x longer with ML Compute than PlaidML. Is this is expected behavior?
Note that both of these are significantly faster than CPU training, but PlaidML seems to do a much better job with acceleration. Are there ML Compute-specific considerations that I need to keep in mind?
Model
PlaidML
ML Compute