Closed AdarshKumar712 closed 3 years ago
@AdarshKumar712 I made a patch to fix it. Try again with the master branch.
Thanks for such quick response.
It works fine, if I do it for one time, but if apply again with the updated past_key_values, then same error repeates. Actually the code I am working on, requires to iterate multiple times the similar step.
Here is the Code to replicate (if needed):
using Transformers.HuggingFace
model = hgf"gpt2:lmheadmodel"
tokens = reshape(Array(1:10),(:,1));
outputs = model(tokens; position_ids=nothing, token_type_ids=nothing,
past_key_values=nothing,
attention_mask=nothing,
output_attentions=true,
output_hidden_states=true,
use_cache=true);
output_new = model(tokens; position_ids=nothing, token_type_ids=nothing,
past_key_values=outputs.past_key_values,
attention_mask=nothing,
output_attentions=true,
output_hidden_states=true,
use_cache=true);
output_new_1 = model(tokens; position_ids=nothing, token_type_ids=nothing,
past_key_values=output_new.past_key_values,
attention_mask=nothing,
output_attentions=true,
output_hidden_states=true,
use_cache=true);
My bad. The new master should fix it.
Yes, now it works properly. Thanks a lot!
@AdarshKumar712 btw, Could you also made a simple PR for text generation example with gpt2?
Sure. I will start working on the PR.
Just one thing, for the tokenizer, should I use the bpe tokenizer, available with load_pretrain("GPT-OpenAIftlm", ...), along with its vocab? Actually, the vocab size we get from OpenAI is around 40k but the HuggingFace model expects 50k vocab size.
When I try to pass the precomputed past_key_values to HuggingFace gpt2 model, I am getting the following error:
I'm using Transformers 0.1.8 with Flux 0.11.6 on Julia 1.6.0 Minimal Code to replicate the above error:
I think it's because past key values are concatenated with present key values, making the shape [head_features, 2*seq_len, num_heads, batch]. However, the attention_mask that is being applied is with respect to the original shape
https://github.com/chengchingwen/Transformers.jl/blob/a013291bc86ada560ad18c99ec6e2e2d5a04c748/src/huggingface/models/gpt2.jl#L93-L97
https://github.com/chengchingwen/Transformers.jl/blob/a013291bc86ada560ad18c99ec6e2e2d5a04c748/src/huggingface/models/bert.jl#L123-L131
Here the attention_mask expects attention_scores shape to be [seq_len, seq_len, num_heads, batch] but after concatenation, it's getting [2*seq_len, seq_len, num_heads, batch]
@chengchingwen Can you please have a look into this?