ageron / handson-ml

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Apache License 2.0
25.19k stars 12.92k forks source link

Chapter 11. SELU example #263

Closed alogblog closed 5 years ago

alogblog commented 6 years ago

Original

means = X_train.mean(axis=0, keepdims=True)
stds = X_train.std(axis=0, keepdims=True) + 1e-10
X_val_scaled = (X_valid - means) / stds

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            X_batch_scaled = (X_batch - means) / stds
            sess.run(training_op, feed_dict={X: X_batch_scaled, y: y_batch})
        if epoch % 5 == 0:
            acc_batch = accuracy.eval(feed_dict={X: X_batch_scaled, y: y_batch})
            acc_valid = accuracy.eval(feed_dict={X: X_val_scaled, y: y_valid})
            print(epoch, "Batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)

New

x_val_means = X_valid.mean(axis=1, keepdims=True)
x_val_stds = X_valid.std(axis=1, keepdims=True) + 1e-10
X_val_scaled = (X_valid - x_val_means ) / x_val_stds 

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            batch_means = X_batch.mean(axis=1, keepdims=True)
            batch_stds  = X_batch.std (axis=1, keepdims=True) + 1e-10
            X_batch_scaled = (X_batch - batch_means) / batch_stds  
            sess.run(training_op, feed_dict={X: X_batch_scaled, y: y_batch})
        if epoch % 5 == 0:
            acc_batch = accuracy.eval(feed_dict={X: X_batch_scaled, y: y_batch})
            acc_valid = accuracy.eval(feed_dict={X: X_val_scaled, y: y_valid})
            print(epoch, "배치 데이터 정확도:", acc_batch, "검증 세트 정확도:", acc_valid)

Am I wrong? If mine is not wrong, is there any reason you did that, which I couldn't catch ?

Thanks.

ageron commented 6 years ago

Hi @alogblog , Thanks for your question. The original is correct.

To understand why feature standardization is important when using gradient descent, please read chapter 4 on gradient descent, there is a section that explains why features should be scaled.

Moreover, at test time we might get a single instance at a time: it would not make sense to standardize a single feature using its own mean and std deviation. So instead, we will use the training set's mean and std deviation at test time to scale new instances. Therefore, during training we should do the same: use the full training set's mean and std deviation (per feature).

Hope this helps, Aurélien

ageron commented 5 years ago

Closing stale issues. Please reopen if you are still experiencing a problem.