ageron / handson-ml2

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Apache License 2.0
28k stars 12.8k forks source link

GPU,TPU in Colab #11

Closed plubinski closed 4 years ago

plubinski commented 5 years ago

Welcome. I have got some idea. Maybe there is a chance to add a readme instruction how to use keras and tensorflow 2.0 with GPU or TPU in Colab? By default, everything is counting by CPU.

ageron commented 5 years ago

Hi @plubinski ,

Thanks for your suggestion. I'm currently writing the last chapter in the 2nd edition, and it will cover running on GPU, distributing across multiple GPUs and servers, training on TPU, and everything. Once it's done, I'll upload the Jupyter notebook, and it should hopefully run on Colab.

Hope this helps.

ageron commented 5 years ago

To use GPU in Colab, all you need to do is:

!pip install -U --pre tensorflow-gpu
# OR
!pip install -U tf-nightly-gpu-2.0-preview

Make sure you have TF 2.0:

import tensorflow as tf
tf.__version__

Now create your Keras model and train it, it will automatically use the GPU.

If you want to convince yourself, you can try this:

with tf.device("/cpu:0"):
  model = keras.models.Sequential([...])
  model.compile(loss="sparse_categorical_crossentropy", optimizer="sgd")

model.fit(X_train, y_train, epochs=3)

This will force the model to be placed on the CPU rather than the GPU, and you will see that it trains and runs much more slowly (especially if you use convolutional layers). That said, the GPUs on Colab are not the fastest, so the difference will not necessarily be incredible.

Hope this helps.

plubinski commented 5 years ago

@ageron thanks for fast answer.

I usually use google vm tesla gpu. But this tip will help many people learn faster.

You are doing a great job!

aisensiy commented 5 years ago

@ageron The code below is from the chapter 16 char rnn part. So... what kind of device are you using to make the training soooo fast? I have a V100 which can make the training only about 5 hours per epoch, how can you make less than 2 hours per epoch?

In [18]:
model = keras.models.Sequential([
    keras.layers.GRU(128, return_sequences=True, input_shape=[None, max_id],
                     # no dropout in stateful RNN (https://github.com/ageron/handson-ml2/issues/32)
                     # dropout=0.2, recurrent_dropout=0.2,
                     ),
    keras.layers.GRU(128, return_sequences=True,
                     # dropout=0.2, recurrent_dropout=0.2
                    ),
    keras.layers.TimeDistributed(keras.layers.Dense(max_id,
                                                    activation="softmax"))
])
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
history = model.fit(dataset, steps_per_epoch=train_size // batch_size,
                    epochs=10)
Epoch 1/10
31370/31370 [==============================] - 6063s 193ms/step - loss: 1.7662
Epoch 2/10
31370/31370 [==============================] - 5744s 183ms/step - loss: 1.6649
Epoch 3/10
31370/31370 [==============================] - 5320s 170ms/step - loss: 1.6508
Epoch 4/10
31370/31370 [==============================] - 5318s 170ms/step - loss: 1.6400
Epoch 5/10
31370/31370 [==============================] - 5318s 170ms/step - loss: 1.6359
Epoch 6/10
31370/31370 [==============================] - 5316s 169ms/step - loss: 1.6344
Epoch 7/10
31370/31370 [==============================] - 5489s 175ms/step - loss: 1.6336
Epoch 8/10
31370/31370 [==============================] - 5638s 180ms/step - loss: 1.6277
Epoch 9/10
31370/31370 [==============================] - 5709s 182ms/step - loss: 1.6309
Epoch 10/10
31370/31370 [==============================] - 6107s 195ms/step - loss: 1.6317
ageron commented 5 years ago

Hi @aisensiy , Interesting, thanks for your question! I believe I used a Google VM with a good GPU, but I forgot which one I used. Perhaps a T4, but I honestly don't remember.

aisensiy commented 5 years ago

Thanks for your reply. I found that is because of using the tensorflow 2.0.0beta...After upgrade to 2.0.0 the speed catches up 😃