aamini / introtodeeplearning

Lab Materials for MIT 6.S191: Introduction to Deep Learning
MIT License
7.26k stars 3.66k forks source link

Lab 2, Part 1, Section 1.4: Missing `from_logits=True` argument? #122

Closed BellaCoola closed 1 year ago

BellaCoola commented 1 year ago

Hello, I am looking at "Lab 2, Part 1: MNIST Digit Classification". In section "1.4 Training the model 2.0", there is the following code block:

# Rebuild the CNN model
cnn_model = build_cnn_model()

batch_size = 12
loss_history = mdl.util.LossHistory(smoothing_factor=0.95) # to record the evolution of the loss
plotter = mdl.util.PeriodicPlotter(sec=2, xlabel='Iterations', ylabel='Loss', scale='semilogy')
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-2) # define our optimizer

if hasattr(tqdm, '_instances'): tqdm._instances.clear() # clear if it exists

for idx in tqdm(range(0, train_images.shape[0], batch_size)):
  # First grab a batch of training data and convert the input images to tensors
  (images, labels) = (train_images[idx:idx+batch_size], train_labels[idx:idx+batch_size])
  images = tf.convert_to_tensor(images, dtype=tf.float32)

  # GradientTape to record differentiation operations
  with tf.GradientTape() as tape:
    #'''TODO: feed the images into the model and obtain the predictions'''
    logits = cnn_model(images)
    # logits = # TODO

    #'''TODO: compute the categorical cross entropy loss
    loss_value = tf.keras.backend.sparse_categorical_crossentropy(labels, logits)
    # loss_value = tf.keras.backend.sparse_categorical_crossentropy('''TODO''', '''TODO''') # TODO

  loss_history.append(loss_value.numpy().mean()) # append the loss to the loss_history record
  plotter.plot(loss_history.get())

  # Backpropagation
  '''TODO: Use the tape to compute the gradient against all parameters in the CNN model.
      Use cnn_model.trainable_variables to access these parameters.''' 
  grads = tape.gradient(loss_value, cnn_model.trainable_variables)
  # grads = # TODO
  optimizer.apply_gradients(zip(grads, cnn_model.trainable_variables))

Shouldn't the tf.keras.backend.sparse_categorical_crossentropy() call also set from_logits parameter to True? (By default it is False.) If no, why not?

TonyHanzhiSU commented 1 year ago

If you look at the initialization of cnn_model, you can see the final dense layer with a softmax activation function in place. As a result, the output of cnn_model is already a tensor from the softmax function which is the default type of parameter for sparse_categorical_crossentropy(from_logits=False).

BellaCoola commented 1 year ago

Thank you very much :)