Hello,
I wonder to know how you implement the prompt with depth less than model's layers. Huggingface requires length of past_key_value to match the model's config.n_layers, so I think that we can't not just pass prompt which does not match layers to past_key_value. Besides, it seems that layers can't share same attention_mask if some of them have prompt and some of them don't.
Hello, I wonder to know how you implement the prompt with depth less than model's layers. Huggingface requires length of
past_key_value
to match the model's config.n_layers, so I think that we can't not just pass prompt which does not match layers topast_key_value
. Besides, it seems that layers can't share same attention_mask if some of them have prompt and some of them don't.Thanks!