Is it necessary to arrange position ids between [prefix_len, prefix_len+seq_len) ?

XiangLi1999 / PrefixTuning

Prefix-Tuning: Optimizing Continuous Prompts for Generation

868 stars 158 forks source link

Is it necessary to arrange position ids between [prefix_len, prefix_len+seq_len) ? #40

Open baiyuting opened 2 years ago

baiyuting commented 2 years ago

I found position ids is in [prefix_len, prefix_len+seq_len) in modeling_gpt2.py

position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)

https://github.com/XiangLi1999/PrefixTuning/blob/6519d30e69b15a180f23e2cd41b766d3f62b8e82/transformers/src/transformers/modeling_gpt2.py#L579

Is it OK to just make position ids in [0, seq_len) ？ Since I have not found the use of position embeddings for prefix matrix.

XiangLi1999 commented 2 years ago

Thanks for the question. Yes, you are right!I think it's fine for training to set position ids [0, seqlen], but you might want to be consistent at decoding time, it might conflict with caching.

zhao1402072392 commented 1 year ago

Hi, I found it will cause the error: position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device) for example: past_length==10, input_shape[-1] ==1024 I guess the reason for the error is caused by: position_ids was changed to [10, 1034], but the tokenizer.model_max_length is 1024. And I didn't notice there were other operations on the input_ids length, mode_max_length, and position_ids. Could you help me with this error?