Add embedding layer - Githubissues

In the current GPT implementation, embedding tokens is done by one-hot encoding them and passing them through a linear layer. Because the tokens are one-hot encoded, we don't actually need to do the full matrix multiplication and can instead do a dictionary lookup.

We can encapsulate this in an Embedding layer which should significantly reduce both memory usage and computational load

bclarkson-code / Tricycle

Add embedding layer #36