Closed davmacario closed 7 months ago
Just syncing the branches;
Added weight-tying to starter node model, as it now contains both the token embedding and the final linear layer (these 2 do the same thing, but in the opposite direction).
Added Readme disclaimer - this repo is WIP.
Just syncing the branches;
Added weight-tying to starter node model, as it now contains both the token embedding and the final linear layer (these 2 do the same thing, but in the opposite direction).
Added Readme disclaimer - this repo is WIP.