PAIR-code / tiny-transformers

Apache License 2.0
16 stars 2 forks source link

small fixes, comments, and script tweak #22

Closed iislucas closed 3 months ago

iislucas commented 3 months ago

Playing around with some hyper-params for tiny transformers, I think my layer-norm implementation might be broken; without it turned off the transformer learns as expected (result below), but with layer norm enabled, the model only learns very general stats.

Example successful run: $ npx ts-node src/lib/seqtasks/tiny_worlds.run_with_transformer.script.ts

Final part of the output

batch: 290     entropyLoss: 1.05391383  accuracy: 0.53125000
(0) is _a:monkey, is _b: ---> cat, is _c
Inference Step: 0
Context: is _a:monkey, is _b:
Target Output: cat
Target next token: cat
Prediction:
    token        prob
    flower       0.27120873
    cat          0.24495488    <- Target
    monkey       0.19198164
    rock         0.12438986
    tree         0.08662897
    elephant     0.07370050
    :            0.00210857
    ,            0.00078981
    _a           0.00078500