Open SeanNaren opened 2 years ago
Hi @SeanNaren, I'm looking to contribute in ML-Software projects and have been using pytorch-lightning
myself (read: I'm a fan!). Can you tell me where to get started for this issue? I'd like to scope if I can devote some of my time fixing this one.
@SeanNaren would you have some points on how/where to start? :rabbit:
Hi @SeanNaren, @Borda, I think here is what is being asked to be modified.
I referred to this example.
In here, we use TranslationTransformer
for the training purpose, and it inherits from Seq2SeqTransformer
. If we see this line, we see that the output is loss, logits, however here the loss is calculated taking the padding token into account.
I found the answer about how to solve it, and it is described by the Hugging Face community here.
So, I guess the change to be made is (in simple language), in the same line, i.e here:
ignore_index = -100
. 0
to -100
Hope this helps in solving the issue.
@spranjal25 are you fine with @uakarsh suggestion?
@spranjal25 are you fine with @uakarsh suggestion?
Yes, that's very helpful. I think I can get started on it. Will pick this up ASA I get some free time, are we looking at a timeline here though @Borda?
not a special rush :)
🐛 Bug
When using the Translation Task, we need to ensure that we skip padding tokens within the loss calculation. Currently we do not replace the padding with -100, which could be deterimental.