Lightning-Universe / lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning
https://lightning-transformers.readthedocs.io
Apache License 2.0
610 stars 77 forks source link

Ignore padding tokens when using Translation Task + `padding='max_length'` #266

Open SeanNaren opened 2 years ago

SeanNaren commented 2 years ago

🐛 Bug

When using the Translation Task, we need to ensure that we skip padding tokens within the loss calculation. Currently we do not replace the padding with -100, which could be deterimental.

spranjal25 commented 2 years ago

Hi @SeanNaren, I'm looking to contribute in ML-Software projects and have been using pytorch-lightning myself (read: I'm a fan!). Can you tell me where to get started for this issue? I'd like to scope if I can devote some of my time fixing this one.

Borda commented 2 years ago

@SeanNaren would you have some points on how/where to start? :rabbit:

uakarsh commented 1 year ago

Hi @SeanNaren, @Borda, I think here is what is being asked to be modified.

I referred to this example. In here, we use TranslationTransformer for the training purpose, and it inherits from Seq2SeqTransformer. If we see this line, we see that the output is loss, logits, however here the loss is calculated taking the padding token into account.

I found the answer about how to solve it, and it is described by the Hugging Face community here.

So, I guess the change to be made is (in simple language), in the same line, i.e here:

  1. Obtain the loss, logits from the common step
  2. Initialize the Cross-Entropy loss with ignore_index = -100.
  3. Make the indexes of the target tokens which are 0 to -100
  4. Calculate the final loss and then perform the steps as usual.

Hope this helps in solving the issue.

Borda commented 1 year ago

@spranjal25 are you fine with @uakarsh suggestion?

spranjal25 commented 1 year ago

@spranjal25 are you fine with @uakarsh suggestion?

Yes, that's very helpful. I think I can get started on it. Will pick this up ASA I get some free time, are we looking at a timeline here though @Borda?

Borda commented 1 year ago

not a special rush :)