Main differences are replace of Flux.Diagonal layers on Flux.Scale, that is not a big problem itself.
The main problem is in the GPT.Stack layer:
in latest Transformers.jl (@0.1.25):
In v0.1.15, load the model and extract the parameters (either with Functors or get_state_dict) and save the parameters (which should only include the arrays with any model type) to disk.
In v0.1.25, load the parameters from step 1, and build the model with new type (either by calling the construct with arrays or construct the model with default initializer and manually assign the correct parameter to the correct array in the model)
I currently trying to load old Transformer model build with Flux@0.12.10 and Transformers@0.1.15 in new Transformers@0.1.25. I tried different approaches proposed in https://discourse.julialang.org/t/how-to-load-bson-file-of-the-model-build-with-flux-0-12-10-to-use-with-flux-0-13-flux-diagonal-deprecated-problem/91588 However, I found that old version of Transformer model differs structurally from new version of Transformer model
Transformer@0.1.25 and Flux@0.13.10
For the Transformer@0.1.15 model
Main differences are replace of Flux.Diagonal layers on Flux.Scale, that is not a big problem itself. The main problem is in the GPT.Stack layer: in latest Transformers.jl (@0.1.25):
While in previous version Transformers@0.1.15
Due to the difference in GPT.Stack layers, i.e.
Gpt{Stack{Symbol("x':x => 4"), NTuple{4,...
vsGpt{Stack{NTuple{4,..
, following error occurs: