Open YuZhang10 opened 7 months ago
@YuZhang10 Hello, the position encoding used by LLaMA is RoPE, a relative position encoding, so there is no difference whether left padding or right padding is used during training. However, during the autoregressive generation process, each new token generated is added to the end of the sentence. If you use the right padding, it will be added after the pad token, which is unreasonable. Therefore, be sure to use left padding during the inference process.
Hi, I noticed you use padding_side='right' in training while 'left' in eval. In my previous experience, padding_side is usually set to 'left' for generation models. (as stated in this link ) Looking forward to your reply~Thanks in advance.