Makes the token output probabilities in the script get sorted nicely.
Fix updated examples in rules.spec file
Adds a couple of comments.
Playing around with some hyper-params for tiny transformers, I think my layer-norm implementation might be broken; without it turned off the transformer learns as expected (result below), but with layer norm enabled, the model only learns very general stats.
Example successful run:
$ npx ts-node src/lib/seqtasks/tiny_worlds.run_with_transformer.script.ts
Final part of the output
batch: 290 entropyLoss: 1.05391383 accuracy: 0.53125000
(0) is _a:monkey, is _b: ---> cat, is _c
Inference Step: 0
Context: is _a:monkey, is _b:
Target Output: cat
Target next token: cat
Prediction:
token prob
flower 0.27120873
cat 0.24495488 <- Target
monkey 0.19198164
rock 0.12438986
tree 0.08662897
elephant 0.07370050
: 0.00210857
, 0.00078981
_a 0.00078500
Playing around with some hyper-params for tiny transformers, I think my layer-norm implementation might be broken; without it turned off the transformer learns as expected (result below), but with layer norm enabled, the model only learns very general stats.
Example successful run:
$ npx ts-node src/lib/seqtasks/tiny_worlds.run_with_transformer.script.ts
Final part of the output