chengchingwen / Transformers.jl

Julia Implementation of Transformer models
MIT License
526 stars 75 forks source link

Tutorial throws a dimensionMismatch error on grad = gradient(()->loss(x, y), ps) #77

Closed JadeOfNameless closed 3 years ago

JadeOfNameless commented 3 years ago

I copy pasted the tutorial into VScode and have been trying to run it for the past few days, to no avail.

all that happens is that it runs slowly for ages and then crashes with the following error:

ERROR: LoadError: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 0 and 11") Stacktrace: [1] _bcs1 @ .\broadcast.jl:501 [inlined] [2] _bcs(shape::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, newshape::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}) (repeats 2 times) @ Base.Broadcast .\broadcast.jl:495 [3] broadcast_shape @ .\broadcast.jl:489 [inlined] [4] combine_axes @ .\broadcast.jl:484 [inlined] [5] instantiate @ .\broadcast.jl:266 [inlined] [6] materialize @ .\broadcast.jl:883 [inlined] [7] adjoint @ C:\Users\Jade\.julia\packages\Zygote\BCfwJ\src\lib\broadcast.jl:74 [inlined] [8] _pullback @ C:\Users\Jade\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:65 [inlined] [9] _pullback @ C:\Users\Jade\.julia\packages\Transformers\V363g\src\basic\loss.jl:25 [inlined] [10] _pullback(::Zygote.Context, ::typeof(logkldivergence), ::Array{Float32, 3}, ::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}) @ Zygote C:\Users\Jade\.julia\packages\Zygote\BCfwJ\src\compiler\interface2.jl:0 [11] _pullback @ C:\Users\Jade\OneDrive\Documents\juliaStuff\test.jl:82 [inlined] [12] _pullback(::Zygote.Context, ::typeof(loss), ::CuArray{Int64, 2, CUDA.Mem.DeviceBuffer}, ::CuArray{Int64, 2, CUDA.Mem.DeviceBuffer}) @ Zygote C:\Users\Jade\.julia\packages\Zygote\BCfwJ\src\compiler\interface2.jl:0 [13] _pullback @ C:\Users\Jade\OneDrive\Documents\juliaStuff\test.jl:105 [inlined] [14] _pullback(::Zygote.Context, ::var"#4#6") @ Zygote C:\Users\Jade\.julia\packages\Zygote\BCfwJ\src\compiler\interface2.jl:0 [15] pullback(f::Function, ps::Zygote.Params) @ Zygote C:\Users\Jade\.julia\packages\Zygote\BCfwJ\src\compiler\interface.jl:338 [16] gradient(f::Function, args::Zygote.Params) @ Zygote C:\Users\Jade\.julia\packages\Zygote\BCfwJ\src\compiler\interface.jl:75 [17] train!() @ Main C:\Users\Jade\OneDrive\Documents\juliaStuff\test.jl:105 [18] top-level scope @ C:\Users\Jade\OneDrive\Documents\juliaStuff\test.jl:114

JadeOfNameless commented 3 years ago

is anyone out there?

JadeOfNameless commented 3 years ago

I don't want to be rude, but at this point I'm worried that rolling my own transformer library might literally be more time effective than waiting for a response here.

what can I do to get help?

chengchingwen commented 3 years ago

Sorry about that, I was quite busy and missing the notification. Basically it is because the tutorial is out-of-date and won't work anymore. I will update it this weekend.

JadeOfNameless commented 3 years ago

oh thanks! sorry if I was rude