b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
MIT License
1.4k stars 94 forks source link

rope slice. #38

Closed b4rtaz closed 4 months ago