Open baiyuting opened 2 years ago
Thanks for the question. Yes, you are right!I think it's fine for training to set position ids [0, seqlen], but you might want to be consistent at decoding time, it might conflict with caching.
Hi, I found it will cause the error:
position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)
for example: past_length==10, input_shape[-1] ==1024
I guess the reason for the error is caused by: position_ids was changed to [10, 1034], but the tokenizer.model_max_length is 1024. And I didn't notice there were other operations on the input_ids length, mode_max_length, and position_ids. Could you help me with this error?
I found position ids is in [prefix_len, prefix_len+seq_len) in modeling_gpt2.py
position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)
https://github.com/XiangLi1999/PrefixTuning/blob/6519d30e69b15a180f23e2cd41b766d3f62b8e82/transformers/src/transformers/modeling_gpt2.py#L579
Is it OK to just make position ids in [0, seq_len) ? Since I have not found the use of position embeddings for prefix matrix.