ageron / handson-ml2

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Apache License 2.0
27.26k stars 12.6k forks source link

[QUESTION] Regarding Hyperparameter Tuning of NN with keras/sklearn #587

Open chrisflip opened 1 year ago

chrisflip commented 1 year ago

Hi, first of all, thanks for this amazing book. I have a question regarding chapter 10, hyperparameter tuning with keras and sklearn: the model allows for multiple hidden layers. However, I believe that n_neurons is fixed across all hidden layers. How can I make the model more flexible so that n_neurons can change with every layer?

Best, Chris

def build_model(n_hidden=1, n_neurons=30, learning_rate=3e-3, input_shape=[8]):  # <=== n_neurons ???
    model = keras.models.Sequential()
    model.add(keras.layers.InputLayer(input_shape=input_shape))
    for layer in range(n_hidden):
        model.add(keras.layers.Dense(n_neurons, activation="relu"))  # <=== ???
    model.add(keras.layers.Dense(1))
    optimizer = keras.optimizers.SGD(learning_rate=learning_rate)
    model.compile(loss="mse", optimizer=optimizer)
    return model

keras_reg = keras.wrappers.scikit_learn.KerasRegressor(build_model)

keras_reg.fit(X_train, y_train, epochs=100,
              validation_data=(X_valid, y_valid),
              callbacks=[keras.callbacks.EarlyStopping(patience=10)])

from scipy.stats import reciprocal
from sklearn.model_selection import RandomizedSearchCV

param_distribs = {
    "n_hidden": [0, 1, 2, 3],
    **"n_neurons": np.arange(1, 100)**               .tolist(),
    "learning_rate": reciprocal(3e-4, 3e-2)      .rvs(1000).tolist(),
}

rnd_search_cv = RandomizedSearchCV(keras_reg, param_distribs, n_iter=10, cv=3, verbose=2)
rnd_search_cv.fit(X_train, y_train, epochs=100,
                  validation_data=(X_valid, y_valid),
                  callbacks=[keras.callbacks.EarlyStopping(patience=10)])
ageron commented 1 year ago

Hi @chrisflip ,

Thanks for your question and sorry for the late reply.

I see two options:

  1. add one parameter per layer, e.g., n_neurons1, n_neurons2, etc.
  2. add one parameter n_neurons that contains a list of number of neurons (e.g., [100, 50, 10]) and use a custom function in the param_distribs dictionary to sample for this multi-dimensional space.

That said, I don't think it's necessary. People used to do this, but it would complicate things, and in practice it didn't really help. Using the same number of neurons at each layer usually works fine. There's essentially one exception: you may want a bottleneck layer in the middle, like in autoencoders, but this only requires one additional parameter.

Hope this helps.