ageron / handson-ml

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Apache License 2.0
25.14k stars 12.91k forks source link

Chapter 13. Model with Embedding layers outputs NaN #606

Closed mlkonopelski closed 3 years ago

mlkonopelski commented 3 years ago

Hi,

I tried to implement this model as stated in book - however during training I have NaNs everywhere. I suspect that the weights got to Inf or input_dim is wrong dimension (those are solutions from SO) but I'm not sure why.

Here is my code to replicate

# Prepare data
housing_file_path = 'https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/housing/housing.csv'
housing_df = pd.read_csv(housing_file_path)
X_train_regular = housing_df.drop(columns=['median_house_value', 'ocean_proximity']).to_numpy()
X_train_categorical = housing_df['ocean_proximity'].to_numpy()
y_train = housing_df['median_house_value'].to_numpy()

NUM_OOV_BUCKETS = 2
VOCAB = ['NEAR BAY', '<1H OCEAN', 'INLAND', 'NEAR OCEAN', 'ISLAND']

indices = tf.range(len(VOCAB), dtype=tf.int64)
table_init = tf.lookup.KeyValueTensorInitializer(VOCAB, indices)
table = tf.lookup.StaticVocabularyTable(table_init, NUM_OOV_BUCKETS)
regular_input = keras.layers.Input(shape=[8])
categorical_input = keras.layers.Input(shape=[], dtype=tf.string)
cat_indices = keras.layers.Lambda(lambda cats: table.lookup(cats))(categorical_input)
cat_embed = keras.layers.Embedding(input_dim=len(VOCAB) + NUM_OOV_BUCKETS, output_dim=2)(cat_indices)
concat_inputs = keras.layers.concatenate([regular_input, cat_embed])
#Dense1 = keras.layers.Dense(10, 'relu')(concat_inputs)
outputs = keras.layers.Dense(1)(concat_inputs)

model = keras.models.Model(inputs=[regular_input, categorical_input],
                           outputs=[outputs])

model.compile(optimizer='adam', loss='mean_squared_error', metrics=keras.metrics.RootMeanSquaredError())

history = model.fit((X_train_regular, X_train_categorical), y_train, batch_size=32, epochs=10, validation_split=0.1, use_multiprocessing=True)

Result:

Epoch 1/10 581/581 [==============================] - 1s 1ms/step - loss: nan - root_mean_squared_error: nan - val_loss: nan - val_root_mean_squared_error: nan Epoch 2/10 581/581 [==============================] - 1s 1ms/step - loss: nan - root_mean_squared_error: nan - val_loss: nan - val_root_mean_squared_error: nan Epoch 3/10 581/581 [==============================] - 1s 1ms/step - loss: nan - root_mean_squared_error: nan - val_loss: nan - val_root_mean_squared_error: nan Epoch 4/10 581/581 [==============================] - 1s 1ms/step - loss: nan - root_mean_squared_error: nan - val_loss: nan - val_root_mean_squared_error: nan Epoch 5/10 581/581 [==============================] - 1s 1ms/step - loss: nan - root_mean_squared_error: nan - val_loss: nan - val_root_mean_squared_error: nan Epoch 6/10 581/581 [==============================] - 1s 1ms/step - loss: nan - root_mean_squared_error: nan - val_loss: nan - val_root_mean_squared_error: nan Epoch 7/10 581/581 [==============================] - 1s 1ms/step - loss: nan - root_mean_squared_error: nan - val_loss: nan - val_root_mean_squared_error: nan Epoch 8/10 581/581 [==============================] - 1s 1ms/step - loss: nan - root_mean_squared_error: nan - val_loss: nan - val_root_mean_squared_error: nan Epoch 9/10 581/581 [==============================] - 1s 1ms/step - loss: nan - root_mean_squared_error: nan - val_loss: nan - val_root_mean_squared_error: nan Epoch 10/10 581/581 [==============================] - 1s 1ms/step - loss: nan - root_mean_squared_error: nan - val_loss: nan - val_root_mean_squared_error: nan

ageron commented 3 years ago

Hi @mlkonopelski , Thanks for your question. I've copied your code in the following Colab notebook, to debug it: https://colab.research.google.com/drive/1DMeCPC93ot417NNqpCBgEkA4UpEwik1v?usp=sharing

I just called dropna() on housing_df, and now there's no nan during training.

Indeed, the housing dataset contains a few nan values, and if you don't fix them before training, then things go wrong. That's because any computation involving nan returns nan. So as soon as one nan value is encountered, the model weights get updated to nan, and after that there's no way to get any other result.

The other thing you should do is to scale the numerical features. Please see the chapter 2 for more details.

Lastly, there's now a second edition of my book, based on TensorFlow 2 instead of 1. It's much simpler to use. The code examples are open source, you can check them out at https://github.com/ageron/handson-ml2

Hope this helps.