dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.56k stars 714 forks source link

Keras serializer needs to recompile model for more training #2183

Open stsievert opened 5 years ago

stsievert commented 5 years ago

In the Keras FAQ,

You can use model.save(filepath) to save a Keras model into a single HDF5 file which will contain:

  • the architecture of the model, allowing to re-create the model the weights of the model
    • the training configuration (loss, optimizer)
    • the state of the optimizer, allowing to resume training exactly where you left off.

serialize_keras_model serializes the first two points, the weights and architecture of the model. It does not serialize the optimizer or it's state.

A brief example:

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from keras import backend as K

model, x_train, y_train = model_and_data()  # defined below in "➤ Details" tag
opt = keras.optimizers.Adam()
assert not hasattr(model, 'loss')
model.compile(loss='categorical_crossentropy',
              optimizer=opt)
assert model.loss == 'categorical_crossentropy'

model.fit(x_train, y_train,
          batch_size=128,
          epochs=1,
          verbose=1)

m2 = deserialize(*serialize(model))
assert not hasattr(m2, 'loss')

m2.fit(X, X)

which raises a Runtime error because the model hasn't been compiled:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-77-e747914aaf97> in <module>()
----> 1 m2.fit(X, X)

/Users/ssievert/anaconda3/envs/dask-master/lib/python3.6/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
    953             sample_weight=sample_weight,
    954             class_weight=class_weight,
--> 955             batch_size=batch_size)
    956         # Prepare validation data.
    957         do_validation = False

/Users/ssievert/anaconda3/envs/dask-master/lib/python3.6/site-packages/keras/engine/training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
    678         if y is not None:
    679             if not self.optimizer:
--> 680                 raise RuntimeError('You must compile a model before '
    681                                    'training/testing. '
    682                                    'Use `model.compile(optimizer, loss)`.')

RuntimeError: You must compile a model before training/testing. Use `model.compile(optimizer, loss)`.

Here's the definition of model_and_data:

def model_and_data():
    img_rows, img_cols = 28, 28
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

    x_train = x_train.astype('float32') / 255
    x_test = x_test.astype('float32') / 255
    y_train = keras.utils.to_categorical(y_train, 10)
    y_test = keras.utils.to_categorical(y_test, 10)

    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3),
                     activation='relu',
                     input_shape=input_shape))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(10, activation='softmax'))
    return model, x_train, y_train

stsievert commented 5 years ago

It looks like we'll have to adapt the logic in keras/engine/saving.py#L137-L184.

martindurant commented 5 years ago

I know, of course, that the use case here is serialization between processes, but I wonder: how would this look as a data source for Intake, i.e., that the model itself is data. The the particular hdf file saved by Keras constitute a well-defined format?

stsievert commented 5 years ago

the particular hdf file saved by Keras constitute a well-defined format?

I think it's a well-defined format. They have some tests to check the file can be loaded and easily read (i.e., they cover more than dumping/loading some binary file).

Mostly, the file produced by keras.save also includes optimizer information. This issue is aimed at that.

mrocklin commented 5 years ago

So, two questions here:

  1. Is it correct to send along the optimizer and loss with the model?
  2. What is the right way to do this. The code @stsievert points to above for h5py could work, but is complex. I notice that Optimizer also implements get/from_config and get/set_weights/gradients. Do these suffice?

cc @bnaul, who wrote the original implementation.

bnaul commented 5 years ago

Seems reasonable to me to include the parameters of the optimizer. I don't recall why this wasn't part of the original implementation but maybe that wasn't part of what got saved to disk back then? It was originally written against a much older version of keras.

mrocklin commented 5 years ago

I don't suppose you have any inclination to repeat your previous excellent work and extend this to include the optimizer if present, do you @bnaul ?

On Sun, Aug 19, 2018 at 8:03 PM, Brett Naul notifications@github.com wrote:

Seems reasonable to me to include the parameters of the optimizer. I don't recall why this wasn't part of the original implementation but maybe that wasn't part of what got saved to disk back then? It was originally written against a much older version of keras.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/2183#issuecomment-414165554, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszCYI-WV7XksEpGV_-brEjtCLWYoZks5uSfzGgaJpZM4V7hyh .

mrocklin commented 5 years ago

:)

On Sun, Aug 19, 2018 at 9:02 PM, Matthew Rocklin mrocklin@anaconda.com wrote:

I don't suppose you have any inclination to repeat your previous excellent work and extend this to include the optimizer if present, do you @bnaul ?

On Sun, Aug 19, 2018 at 8:03 PM, Brett Naul notifications@github.com wrote:

Seems reasonable to me to include the parameters of the optimizer. I don't recall why this wasn't part of the original implementation but maybe that wasn't part of what got saved to disk back then? It was originally written against a much older version of keras.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/2183#issuecomment-414165554, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszCYI-WV7XksEpGV_-brEjtCLWYoZks5uSfzGgaJpZM4V7hyh .

bnaul commented 5 years ago

I've been out of the keras loop for a while now so I'm afraid I probably won't have time to get into this