Dimension mismatch error when passing past_key_values in HuggingFace gpt2 (attention_mask size not matching)

AdarshKumar712 commented 3 years ago

When I try to pass the precomputed past_key_values to HuggingFace gpt2 model, I am getting the following error:

DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 20 and 10")

Stacktrace:
  [1] _bcs1
    @ ./broadcast.jl:501 [inlined]
  [2] _bcs(shape::NTuple{4, Base.OneTo{Int64}}, newshape::NTuple{4, Base.OneTo{Int64}})
    @ Base.Broadcast ./broadcast.jl:495
  [3] broadcast_shape
    @ ./broadcast.jl:489 [inlined]
  [4] combine_axes
    @ ./broadcast.jl:484 [inlined]
  [5] instantiate
    @ ./broadcast.jl:266 [inlined]
  [6] materialize
    @ ./broadcast.jl:883 [inlined]
  [7] _compute_attention_scores(query_layer::Array{Float32, 4}, key_layer::Array{Float32, 4}, attention_mask::Array{Float32, 4})
    @ Transformers.HuggingFace ~/.julia/packages/Transformers/UdOEB/src/huggingface/models/bert.jl:127
  [8] _attn(query::Array{Float32, 4}, key::Array{Float32, 4}, value::Array{Float32, 4}, attention_mask::Array{Float32, 4})
    @ Transformers.HuggingFace ~/.julia/packages/Transformers/UdOEB/src/huggingface/models/gpt2.jl:101
  [9] (::Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}})(query::Array{Float32, 4}, key::Array{Float32, 4}, value::Array{Float32, 4}, attention_mask::Array{Float32, 4}, #unused#::Val{true}, #unused#::Val{true})
    @ Transformers.HuggingFace ~/.julia/packages/Transformers/UdOEB/src/huggingface/models/gpt2.jl:116
 [10] (::Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}})(x::Array{Float32, 3}, past::Tuple{Array{Float32, 4}, Array{Float32, 4}}, attention_mask::Array{Float32, 4}, _output_attentions::Val{true}, _use_cache::Val{true})
    @ Transformers.HuggingFace ~/.julia/packages/Transformers/UdOEB/src/huggingface/models/gpt2.jl:97
 [11] (::Transformers.HuggingFace.HGFGPT2Block{Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.HGFGPT2MLP{typeof(NNlib.gelu), Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}})(x::Array{Float32, 3}, past::Tuple{Array{Float32, 4}, Array{Float32, 4}}, attention_mask::Array{Float32, 4}, _output_attentions::Val{true}, _use_cache::Val{true})
    @ Transformers.HuggingFace ~/.julia/packages/Transformers/UdOEB/src/huggingface/models/gpt2.jl:202
 [12] macro expansion
    @ ~/.julia/packages/Transformers/UdOEB/src/huggingface/models/gpt2.jl:380 [inlined]
 [13] (::Transformers.HuggingFace.HGFGPT2Model{12, Transformers.HuggingFace.FakeTHModuleList{12, NTuple{12, Transformers.HuggingFace.HGFGPT2Block{Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.HGFGPT2MLP{typeof(NNlib.gelu), Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}}}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}})(input::Matrix{Int64}, position_ids::Matrix{Int64}, token_type_ids::Matrix{Int64}, past::NTuple{12, Tuple{Array{Float32, 4}, Array{Float32, 4}}}, attention_mask::Nothing, _output_attentions::Val{true}, _output_hidden_states::Val{true}, _use_cache::Val{true})
    @ Transformers.HuggingFace ~/.julia/packages/Transformers/UdOEB/src/huggingface/models/gpt2.jl:298
 [14] (::Transformers.HuggingFace.HGFGPT2LMHeadModel{Transformers.HuggingFace.HGFGPT2Model{12, Transformers.HuggingFace.FakeTHModuleList{12, NTuple{12, Transformers.HuggingFace.HGFGPT2Block{Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.HGFGPT2MLP{typeof(NNlib.gelu), Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}}}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}}, Transformers.HuggingFace.FakeTHLinear{Matrix{Float32}, Nothing}})(input::Matrix{Int64}, position_ids::Matrix{Int64}, token_type_ids::Matrix{Int64}, past::NTuple{12, Tuple{Array{Float32, 4}, Array{Float32, 4}}}, attention_mask::Nothing, _output_attentions::Val{true}, _output_hidden_states::Val{true}, _use_cache::Val{true})
    @ Transformers.HuggingFace ~/.julia/packages/Transformers/UdOEB/src/huggingface/models/gpt2.jl:440
 [15] (::Transformers.HuggingFace.HGFGPT2LMHeadModel{Transformers.HuggingFace.HGFGPT2Model{12, Transformers.HuggingFace.FakeTHModuleList{12, NTuple{12, Transformers.HuggingFace.HGFGPT2Block{Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.HGFGPT2MLP{typeof(NNlib.gelu), Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}}}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}}, Transformers.HuggingFace.FakeTHLinear{Matrix{Float32}, Nothing}})(input::Matrix{Int64}; position_ids::Matrix{Int64}, token_type_ids::Matrix{Int64}, past_key_values::NTuple{12, Tuple{Array{Float32, 4}, Array{Float32, 4}}}, attention_mask::Nothing, output_attentions::Bool, output_hidden_states::Bool, use_cache::Bool)
    @ Transformers.HuggingFace ~/.julia/packages/Transformers/UdOEB/src/huggingface/models/gpt2.jl:416
 [16] top-level scope
    @ In[23]:1
 [17] eval
    @ ./boot.jl:360 [inlined]
 [18] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1094

I'm using Transformers 0.1.8 with Flux 0.11.6 on Julia 1.6.0 Minimal Code to replicate the above error:

using Transformers.HuggingFace
model = hgf"gpt2:lmheadmodel"
tokens = reshape(Array(1:10),(:,1));
outputs = model(tokens; position_ids=nothing, token_type_ids=nothing,
                                    past_key_values=nothing,
                                    attention_mask=nothing,
                                    output_attentions=true,
                                    output_hidden_states=true,
                                    use_cache=true);
output_new = model(tokens; position_ids=nothing, token_type_ids=nothing,
                                    past_key_values=outputs.past_key_values,
                                    attention_mask=nothing,
                                    output_attentions=true,
                                    output_hidden_states=true,
                                    use_cache=true);

I think it's because past key values are concatenated with present key values, making the shape [head_features, 2*seq_len, num_heads, batch]. However, the attention_mask that is being applied is with respect to the original shape

https://github.com/chengchingwen/Transformers.jl/blob/a013291bc86ada560ad18c99ec6e2e2d5a04c748/src/huggingface/models/gpt2.jl#L93-L97

https://github.com/chengchingwen/Transformers.jl/blob/a013291bc86ada560ad18c99ec6e2e2d5a04c748/src/huggingface/models/bert.jl#L123-L131

Here the attention_mask expects attention_scores shape to be [seq_len, seq_len, num_heads, batch] but after concatenation, it's getting [2*seq_len, seq_len, num_heads, batch]

@chengchingwen Can you please have a look into this?

chengchingwen commented 3 years ago

@AdarshKumar712 I made a patch to fix it. Try again with the master branch.

AdarshKumar712 commented 3 years ago

Thanks for such quick response.

It works fine, if I do it for one time, but if apply again with the updated past_key_values, then same error repeates. Actually the code I am working on, requires to iterate multiple times the similar step.

Here is the Code to replicate (if needed):

using Transformers.HuggingFace
model = hgf"gpt2:lmheadmodel"
tokens = reshape(Array(1:10),(:,1));
outputs = model(tokens; position_ids=nothing, token_type_ids=nothing,
                                    past_key_values=nothing,
                                    attention_mask=nothing,
                                    output_attentions=true,
                                    output_hidden_states=true,
                                    use_cache=true);
output_new = model(tokens; position_ids=nothing, token_type_ids=nothing,
                                    past_key_values=outputs.past_key_values,
                                    attention_mask=nothing,
                                    output_attentions=true,
                                    output_hidden_states=true,
                                    use_cache=true);
output_new_1 = model(tokens; position_ids=nothing, token_type_ids=nothing,
                                    past_key_values=output_new.past_key_values,
                                    attention_mask=nothing,
                                    output_attentions=true,
                                    output_hidden_states=true,
                                    use_cache=true);

chengchingwen commented 3 years ago

My bad. The new master should fix it.

AdarshKumar712 commented 3 years ago

Yes, now it works properly. Thanks a lot!

chengchingwen commented 3 years ago

@AdarshKumar712 btw, Could you also made a simple PR for text generation example with gpt2?

AdarshKumar712 commented 3 years ago

Sure. I will start working on the PR.

Just one thing, for the tokenizer, should I use the bpe tokenizer, available with load_pretrain("GPT-OpenAIftlm", ...), along with its vocab? Actually, the vocab size we get from OpenAI is around 40k but the HuggingFace model expects 50k vocab size.

chengchingwen / Transformers.jl

Dimension mismatch error when passing past_key_values in HuggingFace gpt2 (attention_mask size not matching) #55