Atcold / NYU-DLSP20

NYU Deep Learning Spring 2020
https://atcold.github.io/NYU-DLSP20/
Other
6.66k stars 2.22k forks source link

Update 08-seq_classification.ipynb #814

Closed Gaaaavin closed 2 years ago

Gaaaavin commented 2 years ago

Made changes to the line 507 and 508 of 08-seq_classification.ipynb. The original lines were incorrect, making the evaluation of model incorrect.

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Atcold commented 2 years ago

Can you provide some context, plz?

Gaaaavin commented 2 years ago

In the original code, we use

sequence_end = torch.tensor([len(sequence) for sequence in data_decoded]) - 1
output = output[torch.arange(data.shape[0]).long(), sequence_end, :]

to get the output classification from the model. In other words, we treat the len(sequence)-th element in the output as the classification. However, this is not how we trained the model. During training, we always treat the last element in the output as the classification:

# Pick only the output corresponding to last sequence element (input is pre padded)
output = output[:, -1, :]

I think this is the reason why the accuracy in the model evaluation block is very low and doesn't correspond to the log of training.

Gaaaavin commented 2 years ago

Btw, I've also posed a thread on Campuswire: https://campuswire.com/c/GCEF8E4E7/feed/82

Atcold commented 2 years ago

I see what's going on… https://github.com/Atcold/pytorch-Deep-Learning/pull/19 was only partially correcting this issue. Very good, @Gaaaavin, you got extra credit for this!

JonathanSum commented 2 years ago

@Gaaaavin Is it possible to share the thread on Campuswire to let the people in public learn about it? Thank you.

I guess the point of this post is len(sequence) is wrong because it was the last element from the data or treating the "last element" as the classification output is how the training part uses.

Gaaaavin commented 2 years ago

In the original code, we use

sequence_end = torch.tensor([len(sequence) for sequence in data_decoded]) - 1
output = output[torch.arange(data.shape[0]).long(), sequence_end, :]

to get the output classification from the model. In other words, we treat the len(sequence)-th element in the output as the classification. However, this is not how we trained the model. During training, we always treat the last element in the output as the classification:

# Pick only the output corresponding to last sequence element (input is pre padded)
output = output[:, -1, :]

I think this is the reason why the accuracy in the model evaluation block is very low and doesn't correspond to the log of training.

Basically it's the same as this one