Closed AdarshKumar712 closed 3 years ago
I was working with HuggingFace GPT2. I am trying to differentiate some loss on logits of GPT2 with respect to past_key_values pre-calculated. While the forward pass works correctly, I am getting the following error in backward pass:
past_key_values
MethodError: no method matching AbstractFloat(::Type{Any}) Closest candidates are: (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50 (::Type{T})(::Base.TwicePrecision) where T<:Number at twiceprecision.jl:243 (::Type{T})(::Flux.NilNumber.Nil) where T<:Number at /home/adarshkumar712/.julia/packages/Flux/0c9kI/src/outputsize.jl:17 ... Stacktrace: [1] float(x::Type) @ Base ./float.jl:206 [2] (::ComposedFunction{typeof(float), typeof(eltype)})(x::Function) @ Base ./operators.jl:938 [3] softmax(x::Function; dims::Int64) @ NNlib ~/.julia/packages/NNlib/3MZcC/src/softmax.jl:48 [4] rrule(::typeof(softmax), xs::Function; dims::Int64) @ NNlib ~/.julia/packages/NNlib/3MZcC/src/softmax.jl:80 [5] chain_rrule_kw @ ~/.julia/packages/Zygote/zowrf/src/compiler/chainrules.jl:101 [inlined] [6] macro expansion @ ~/.julia/packages/Zygote/zowrf/src/compiler/interface2.jl:0 [inlined] [7] _pullback(::Zygote.Context, ::NNlib.var"#softmax##kw", ::NamedTuple{(:dims,), Tuple{Int64}}, ::typeof(softmax), ::typeof(Transformers.HuggingFace.apply_shift_mask)) @ Zygote ~/.julia/packages/Zygote/zowrf/src/compiler/interface2.jl:9 [8] _pullback @ ~/.julia/packages/Transformers/rCnGb/src/huggingface/models/gpt2.jl:124 [inlined] [9] _pullback(::Zygote.Context, ::typeof(Transformers.HuggingFace._attn), ::Array{Float32, 4}, ::Array{Float32, 4}, ::Array{Float32, 4}, ::Transformers.HuggingFace.ShiftAttentionMask{Array{Float32, 4}}) @ Zygote ~/.julia/packages/Zygote/zowrf/src/compiler/interface2.jl:0 [10] _pullback @ ~/.julia/packages/Transformers/rCnGb/src/huggingface/models/gpt2.jl:138 [inlined] [11] _pullback(::Zygote.Context, ::Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}, ::Array{Float32, 4}, ::Array{Float32, 4}, ::Array{Float32, 4}, ::Transformers.HuggingFace.ShiftAttentionMask{Array{Float32, 4}}, ::Val{false}, ::Val{true}) @ Zygote ~/.julia/packages/Zygote/zowrf/src/compiler/interface2.jl:0 [12] _pullback @ ~/.julia/packages/Transformers/rCnGb/src/huggingface/models/gpt2.jl:97 [inlined] [13] _pullback(::Zygote.Context, ::Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}, ::Array{Float32, 3}, ::Tuple{Array{Float32, 4}, Array{Float32, 4}}, ::Transformers.HuggingFace.ShiftAttentionMask{Array{Float32, 4}}, ::Val{false}, ::Val{true}) @ Zygote ~/.julia/packages/Zygote/zowrf/src/compiler/interface2.jl:0 [14] _pullback @ ~/.julia/packages/Transformers/rCnGb/src/huggingface/models/gpt2.jl:224 [inlined] [15] _pullback(::Zygote.Context, ::Transformers.HuggingFace.HGFGPT2Block{Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.HGFGPT2MLP{typeof(gelu), Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}}, ::Array{Float32, 3}, ::Tuple{Array{Float32, 4}, Array{Float32, 4}}, ::Transformers.HuggingFace.ShiftAttentionMask{Array{Float32, 4}}, ::Val{false}, ::Val{true}) @ Zygote ~/.julia/packages/Zygote/zowrf/src/compiler/interface2.jl:0 [16] macro expansion @ ~/.julia/packages/Transformers/rCnGb/src/huggingface/models/gpt2.jl:413 [inlined] [17] _pullback @ ~/.julia/packages/Transformers/rCnGb/src/huggingface/models/gpt2.jl:326 [inlined] [18] _pullback(::Zygote.Context, ::Transformers.HuggingFace.HGFGPT2Model{12, Transformers.HuggingFace.FakeTHModuleList{12, NTuple{12, Transformers.HuggingFace.HGFGPT2Block{Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.HGFGPT2MLP{typeof(gelu), Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}}}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}}, ::Matrix{Int64}, ::Nothing, ::Nothing, ::NTuple{12, Tuple{Array{Float32, 4}, Array{Float32, 4}}}, ::Nothing, ::Val{false}, ::Val{true}, ::Val{true}) @ Zygote ~/.julia/packages/Zygote/zowrf/src/compiler/interface2.jl:0 [19] _pullback @ ~/.julia/packages/Transformers/rCnGb/src/huggingface/models/gpt2.jl:473 [inlined] [20] _pullback(::Zygote.Context, ::Transformers.HuggingFace.HGFGPT2LMHeadModel{Transformers.HuggingFace.HGFGPT2Model{12, Transformers.HuggingFace.FakeTHModuleList{12, NTuple{12, Transformers.HuggingFace.HGFGPT2Block{Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.HGFGPT2MLP{typeof(gelu), Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}}}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}}, Transformers.HuggingFace.FakeTHLinear{LinearAlgebra.Transpose{Float32, Matrix{Float32}}, Nothing}}, ::Matrix{Int64}, ::Nothing, ::Nothing, ::NTuple{12, Tuple{Array{Float32, 4}, Array{Float32, 4}}}, ::Nothing, ::Val{false}, ::Val{true}, ::Val{true}) @ Zygote ~/.julia/packages/Zygote/zowrf/src/compiler/interface2.jl:0 [21] _pullback @ ~/.julia/packages/Transformers/rCnGb/src/huggingface/models/gpt2.jl:449 [inlined] [22] _pullback(::Zygote.Context, ::Transformers.HuggingFace.var"##_#201", ::Nothing, ::Nothing, ::NTuple{12, Tuple{Array{Float32, 4}, Array{Float32, 4}}}, ::Nothing, ::Bool, ::Bool, ::Bool, ::Transformers.HuggingFace.HGFGPT2LMHeadModel{Transformers.HuggingFace.HGFGPT2Model{12, Transformers.HuggingFace.FakeTHModuleList{12, NTuple{12, Transformers.HuggingFace.HGFGPT2Block{Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.HGFGPT2MLP{typeof(gelu), Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}}}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}}, Transformers.HuggingFace.FakeTHLinear{LinearAlgebra.Transpose{Float32, Matrix{Float32}}, Nothing}}, ::Matrix{Int64}) @ Zygote ~/.julia/packages/Zygote/zowrf/src/compiler/interface2.jl:0 [23] _pullback @ ~/.julia/packages/Transformers/rCnGb/src/huggingface/models/gpt2.jl:449 [inlined] [24] _pullback(::Zygote.Context, ::Core.var"#Any##kw", ::NamedTuple{(:past_key_values, :output_attentions, :output_hidden_states, :use_cache), Tuple{NTuple{12, Tuple{Array{Float32, 4}, Array{Float32, 4}}}, Bool, Bool, Bool}}, ::Transformers.HuggingFace.HGFGPT2LMHeadModel{Transformers.HuggingFace.HGFGPT2Model{12, Transformers.HuggingFace.FakeTHModuleList{12, NTuple{12, Transformers.HuggingFace.HGFGPT2Block{Transformers.HuggingFace.HGFGPT2Attention{Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}, Transformers.HuggingFace.HGFGPT2MLP{typeof(gelu), Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}, Transformers.HuggingFace.FakeHGFConv1D{Matrix{Float32}, Vector{Float32}}}}}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHEmbedding{Matrix{Float32}}, Transformers.HuggingFace.FakeTHLayerNorm{Vector{Float32}}}, Transformers.HuggingFace.FakeTHLinear{LinearAlgebra.Transpose{Float32, Matrix{Float32}}, Nothing}}, ::Matrix{Int64}) @ Zygote ~/.julia/packages/Zygote/zowrf/src/compiler/interface2.jl:0 [25] _pullback @ ./In[8]:3 [inlined] [26] _pullback(::Zygote.Context, ::var"#1#2") @ Zygote ~/.julia/packages/Zygote/zowrf/src/compiler/interface2.jl:0 [27] pullback(f::Function, ps::Params) @ Zygote ~/.julia/packages/Zygote/zowrf/src/compiler/interface.jl:250 [28] top-level scope @ In[8]:2 [29] eval @ ./boot.jl:360 [inlined] [30] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base ./loading.jl:1094
I believe that this is something related to the ShiftAttentionMask. Here is the MWE:
ShiftAttentionMask
using Transformers.HuggingFace using Zygote, Flux model = hgf"gpt2:lmheadmodel" tokens = reshape(Array(1:10),(:,1)); outputs = model(tokens[1:end-1, :]; position_ids=nothing, token_type_ids=nothing, past_key_values=nothing, attention_mask=nothing, output_attentions=true, output_hidden_states=true, use_cache=true); past = outputs.past_key_values prev = tokens[end:end, :]; ps = params(past) _, back = Zygote.pullback(ps) do output_1 = model(prev; past_key_values=past, output_attentions=false, output_hidden_states=true, use_cache=true); hidden = output_1.hidden_states[end] logits = model.lm_head(hidden)[:, end, :] logits[1] end
Please let me know, if I am doing something wrong in this code. cc @chengchingwen
Should be fixed with the new release (v0.1.13). Let me know if the problem remains.
I was working with HuggingFace GPT2. I am trying to differentiate some loss on logits of GPT2 with respect to
past_key_values
pre-calculated. While the forward pass works correctly, I am getting the following error in backward pass:I believe that this is something related to the
ShiftAttentionMask
. Here is the MWE:Please let me know, if I am doing something wrong in this code. cc @chengchingwen