eole-nlp / eole

Open language modeling toolkit based on PyTorch
https://eole-nlp.github.io/eole
MIT License
54 stars 11 forks source link

fixed mismatch between mask and batch dimensions #6

Closed l-k-11235 closed 4 months ago

l-k-11235 commented 4 months ago

The 'zero-out-prompt-loss' is broken because of a mismatch between the mask and tgt side of the the batch I have tested the fix a a simple example:

Bonjour les amis ### Response: ⦅newline⦆bonjour !

I have printed in the ignore_prompt method of the LossCompute class:

# mask 
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]], device='cuda:0')
# batch["tgt"] before masking
tensor([[ 82682,   3626,  87893,  17011,  94768,     26,    721,    189,   6099,
          30363,    759, 128002]], device='cuda:0')
# batch["tgt"] after masking
tensor([[   189,    189,    189,    189,    189,    189,    189,    189,   6099,
          30363,    759, 128002]], device='cuda:0')