Chapter 9: MSE not improving when using placeholders

amarrerod commented 5 years ago

Hi @ageron,

I'm following the examples in Chapter 9 an everything it is going fine until the Gradient Descent optimization with placeholders. At this point, when I compute the MSE the output value does not improve over time. Actually, it seems to be some random value. So when I'm trying to visualize the evolution in tensorboard it is almost impossible.

The output that I'm getting is the following one:

Output:

Epoch 0, MSE: 0.5039241313934326 Epoch 10, MSE: 0.5623587965965271 Epoch 20, MSE: 0.5565890669822693 Epoch 30, MSE: 0.5912410020828247 Epoch 40, MSE: 0.3456602096557617 Epoch 50, MSE: 0.44132304191589355 Epoch 60, MSE: 0.5516647696495056 Epoch 70, MSE: 0.4493933618068695 Epoch 80, MSE: 0.5868946313858032 Epoch 90, MSE: 0.3868793547153473 Epoch 100, MSE: 0.5991948843002319 [...] Epoch 300, MSE: 0.586829423904419 [...] Epoch 500, MSE: 0.4198234975337982 [...] Epoch 610, MSE: 0.4104050099849701 Epoch 620, MSE: 0.49277451634407043 Epoch 630, MSE: 0.6931162476539612 Epoch 640, MSE: 0.5174148678779602 Epoch 650, MSE: 0.33457547426223755 [...] Epoch 790, MSE: 0.5538029670715332 Epoch 800, MSE: 0.706266462802887 Epoch 810, MSE: 0.554472029209137 Epoch 820, MSE: 0.49948737025260925 Epoch 830, MSE: 0.6453415155410767 Epoch 840, MSE: 0.4202166497707367 Epoch 850, MSE: 0.3483142852783203 Epoch 860, MSE: 0.4129936695098877 [...] Epoch 920, MSE: 0.5643522143363953 Epoch 930, MSE: 0.4289623200893402 Epoch 940, MSE: 0.4028488099575043 Epoch 950, MSE: 0.662270188331604 Epoch 960, MSE: 0.43962904810905457 Epoch 970, MSE: 0.38815638422966003 Epoch 980, MSE: 0.4879280924797058 Epoch 990, MSE: 0.45457449555397034 Best theta: [[ 2.0714476 ] [ 0.8462012 ] [ 0.11558536] [-0.26835835] [ 0.32982785] [ 0.00608358] [ 0.07052912] [-0.8798858 ] [-0.86342514]]

This is my code at this time. I have checked other issues related with this exercise but anything seems to work. I'm using tensorflow 2.0.0a0, may that be the issue?

Hope that you can clue me in.

Thank you in advance.

#!/usr/bin/python3
# Page 239
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import fetch_california_housing

tf.logging.set_verbosity(tf.logging.ERROR)  # NON PLUS TURRA

# Loading the data
housing = fetch_california_housing()
m, n = housing.data.shape

# Dividing the set into 100-items batches
batch_size = 100
n_batches = int(np.ceil(m / batch_size))
n_epochs = 1000
learning_rate = 0.01

scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

# Function to generate the batches

def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)
    indices = np.random.randint(m, size=batch_size)
    x_batch = scaled_housing_data_plus_bias[indices]
    y_batch = housing.target.reshape(-1, 1)[indices]
    return x_batch, y_batch

# Using Placeholders
X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

theta = tf.Variable(tf.random_uniform(
    [n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_predicted = tf.matmul(X, theta, name="predictions")
error = y_predicted - y

mse = tf.reduce_mean(tf.square(error), name="mse")
# USING THE TF GRADIENT DESCENT OPTIMIZER
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
traninig_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            x_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            mse_value, _ = sess.run([mse, traninig_op],
                                    feed_dict={X: x_batch, y: y_batch})
        if epoch % 10 == 0:
            print(f"Epoch {epoch}, MSE: {mse_value}")
    best_theta = theta.eval()
    print(f"Best theta: {best_theta}")

ageron commented 5 years ago

Hi @marreA , Actually, the MSE displayed in this code is just the MSE on the last batch, not on the full training set. This is why it looks random. If you compute the MSE over the whole training set instead, it should look much better:

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            x_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            mse_value, _ = sess.run([mse, traninig_op],
                                    feed_dict={X: x_batch, y: y_batch})
        mse_value = sess.run(mse, feed_dict={
            X: scaled_housing_data_plus_bias, y: housing.target.reshape(-1, 1)})
        print(f"Epoch {epoch}, MSE: {mse_value}")
    best_theta = theta.eval()
    print(f"Best theta: {best_theta}")

It actually converges pretty quickly, which explains why it doesn't seem to go down. You can reduce the learning rate (e.g., to 1e-4) to see it go down slowly.

Hope this helps!

amarrerod commented 5 years ago

Hi @ageron,

Thanks for your help, now it seems to perform well!

ageron / handson-ml

Chapter 9: MSE not improving when using placeholders #428

Output: