davmacario / MDI-LLM

Implementation of Model-Distributed Inference for Large Language Models, built on top of LitGPT
MIT License
3 stars 2 forks source link

Adopt Conv1D layers instead of Linear layer #6

Closed davmacario closed 4 months ago

davmacario commented 7 months ago

As explained here, in A. Karpathy's repository, he used Linear layers in the model, while the GPT2 implementation from Huggingface uses Conv1D layers.

The result is the same, but this requires less operations when loading from pretrained, plus it's compliant with the Transformers library.

davmacario commented 4 months ago

Not pertinent anymore - switched to LitGPT