davmacario / MDI-LLM

Implementation of Model-Distributed Inference for Large Language Models, built on top of LitGPT
MIT License
3 stars 2 forks source link

[FEAT]: first working model, tested, using NanoGPT #2

Closed davmacario closed 8 months ago

davmacario commented 8 months ago

First milestone hit - the system works and implements recurrent pipelining correctly. The model used is NanoGPT, split among the nodes (tested with up to 3). Message exchange happens with both an HTTP server (control messages) and 2 low-level (TCP/IP) sockets to exchange activations when performing inference.

Known issue: slow generation at the beginning (when n_tokens < context_size).