Right now, the labels are formatted as one hot vector: shape is [batch size, max_seq_len, num_classes]. This makes the loss CategoricalCrossentropy.
In the original examples, the labels are given as just integers: shape is [batch_size, max_seq_len, 1]. This makes the loss SparseCategoricalCrossentropy.
I am not sure if they would be different at all, but worth experimenting.
Right now, the labels are formatted as one hot vector: shape is
[batch size, max_seq_len, num_classes]
. This makes the loss CategoricalCrossentropy.In the original examples, the labels are given as just integers: shape is
[batch_size, max_seq_len, 1]
. This makes the loss SparseCategoricalCrossentropy.I am not sure if they would be different at all, but worth experimenting.