Closed janfrancu closed 4 years ago
I can reproduce this result on Julia 1.4.2 with the master branch. It does look like there are some problems with type inference for multihead attention. I will take some time to fix this.
Thanks for reporting it!
Should be fixed in the new release (v0.1.7)
Forward step of Transformer is type unstable. Running the example from the docs
and checking for
@code_warntype
produces:The source of the unstabillity is probably the multihead attention, but I have not been able to distill it any further. I am using latest tagged version 0.1.3 of Transformers on Julia 1.4.1.