Question about the implementation of "4.4 Prompt depth" experiment

sdqdlgj commented 2 years ago

Hi, this repo contains nice code. I wonder how to implement the P-tuning-v2 with different prompt depth (e.g. the "Prompt depth" experiment ) ? The paper said "... we change their attention masks for disallowing their prefix prompts to involve in the computation." May I ask how to change the attention masks in different layers? Is there any example code? Thanks in advance!

Xiao9905 commented 2 years ago

@sdqdlgj Hi,

Thanks for your interest in our work! About you question, I am afraid that there isn't any out-of-box interfaces to change attention masks. Our implementation modifies some source code in huggingface transformers, so we didn't release them for simplicity.

But I can provide you some hints on implementing the feature. In huggingface's source code, the attention_mask incorporates the past_key_values at this line:

attention_mask = torch.ones(((batch_size, seq_length + past_key_values_length)), device=device)

and it is passed to RobertaSelfAttention in each RobertaLayer at this line:

self_attention_outputs = self.attention(
    hidden_states,
    attention_mask,
    head_mask,
    output_attentions=output_attentions,
    past_key_value=self_attn_past_key_value,
)

You can change the attention_mask before it is passed to RobertaSelfAttention in this function at each layer.

sdqdlgj commented 2 years ago

Got it! Thanks for your reply and valuable hints.

THUDM / P-tuning-v2

Question about the implementation of "4.4 Prompt depth" experiment #33