ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

Add wte factorization #197

Open klei22 opened 1 month ago

klei22 commented 1 month ago

This is a draft for experimenting with new variations on embedding tables.

klei22 commented 1 month ago
  1. Dataset preparation:

    • cd data/aozorabunko_clean
    • bash get_dataset.sh - obtains dataset (requires jq command)
    • bash process.sh - takes a while
    • python3 prepare.py -t input.txt --method char
  2. Manual embedding table creation:

    • python3 mapping.py maps from the table to an .npy file.
  3. train.py and model.py setup:

    • model.py uses this file to set the word table and lm head if set up in the GPT class constructor
    • train.py's --n_embd_main needs to match the embedding dimension of the npy matrix