First milestone hit - the system works and implements recurrent pipelining correctly.
The model used is NanoGPT, split among the nodes (tested with up to 3).
Message exchange happens with both an HTTP server (control messages) and 2 low-level (TCP/IP) sockets to exchange activations when performing inference.
Known issue: slow generation at the beginning (when n_tokens < context_size).
First milestone hit - the system works and implements recurrent pipelining correctly. The model used is NanoGPT, split among the nodes (tested with up to 3). Message exchange happens with both an HTTP server (control messages) and 2 low-level (TCP/IP) sockets to exchange activations when performing inference.
Known issue: slow generation at the beginning (when n_tokens < context_size).