Chapter 11. SELU example

alogblog commented 6 years ago

Original

means = X_train.mean(axis=0, keepdims=True)
stds = X_train.std(axis=0, keepdims=True) + 1e-10
X_val_scaled = (X_valid - means) / stds

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            X_batch_scaled = (X_batch - means) / stds
            sess.run(training_op, feed_dict={X: X_batch_scaled, y: y_batch})
        if epoch % 5 == 0:
            acc_batch = accuracy.eval(feed_dict={X: X_batch_scaled, y: y_batch})
            acc_valid = accuracy.eval(feed_dict={X: X_val_scaled, y: y_valid})
            print(epoch, "Batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)

New

x_val_means = X_valid.mean(axis=1, keepdims=True)
x_val_stds = X_valid.std(axis=1, keepdims=True) + 1e-10
X_val_scaled = (X_valid - x_val_means ) / x_val_stds 

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            batch_means = X_batch.mean(axis=1, keepdims=True)
            batch_stds  = X_batch.std (axis=1, keepdims=True) + 1e-10
            X_batch_scaled = (X_batch - batch_means) / batch_stds  
            sess.run(training_op, feed_dict={X: X_batch_scaled, y: y_batch})
        if epoch % 5 == 0:
            acc_batch = accuracy.eval(feed_dict={X: X_batch_scaled, y: y_batch})
            acc_valid = accuracy.eval(feed_dict={X: X_val_scaled, y: y_valid})
            print(epoch, "배치 데이터 정확도:", acc_batch, "검증 세트 정확도:", acc_valid)

Am I wrong? If mine is not wrong, is there any reason you did that, which I couldn't catch ?

Thanks.

ageron commented 6 years ago

Hi @alogblog , Thanks for your question. The original is correct.

When we perform feature standardization, we must compute the mean and standard deviation for each feature (not for each instance). So the axis should be 0, not 1.
Secondly, when we standardize a training batch, it is best to use the full training set's mean and standard deviation (for each feature), rather than the batch's mean and std.

To understand why feature standardization is important when using gradient descent, please read chapter 4 on gradient descent, there is a section that explains why features should be scaled.

Moreover, at test time we might get a single instance at a time: it would not make sense to standardize a single feature using its own mean and std deviation. So instead, we will use the training set's mean and std deviation at test time to scale new instances. Therefore, during training we should do the same: use the full training set's mean and std deviation (per feature).

Hope this helps, Aurélien

ageron commented 5 years ago

Closing stale issues. Please reopen if you are still experiencing a problem.

ageron / handson-ml

Chapter 11. SELU example #263

Original

New