ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

merged with master and enabled plotting of strongermax/polymax #165

Closed Hrancheng closed 3 months ago

Hrancheng commented 3 months ago

Extended plotting of input/output statistics to strongermax and polymax, used similarly to consmax by specifying the stats to be visualized. Command line would look like: python3 train.py --out_dir=out --device=cpu --eval_interval=2 --log_interval=1 --block_size=2 --batch_size=5 --n_layer=3 --n_head=4 --n_embd=16 --lr_decay_iters=2 --dtype="float32" --max_iter=20 --softmax_variant_attn="polymax" --statistic=output_mean