bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 7 forks source link

Improve SmolGPT #73

Closed bclarkson-code closed 1 month ago

bclarkson-code commented 2 months ago

SmolGPT (49M) is not very performant. It gave it the following prompt:

def add_one(x):

It completed it as follows:

def add_one(x):
    return 10

I think that there are a number of issues.

First, The model could of course be bigger. With more optimised kernels, modern techniques like rotary embeddings and multi-gpu support, we will hopefully be able to train a larger model in a reasonable amount of time.

Second, the dataset can probably be improved. More evaluation is needed but I think adding some web text has the potential to make using the model easier

bclarkson-code commented 1 month ago

This was fixed by training GPT-2(124M)