b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.02k stars 68 forks source link

feat: accelerator structure. #90

Closed b4rtaz closed 3 weeks ago

b4rtaz commented 3 weeks ago

This PR introduces changes that soon will allow this project to support GPU.