b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.03k stars 69 forks source link

Will this awesome proj consider supporting GPU acceleration? #35

Open galenyu opened 2 months ago

galenyu commented 2 months ago

A very impressive job!

But it doesn't seem to support the use of GPU. Does the author consider developing code that supports GPU acceleration?

Any suggestions to migrate this project to CUDA/HIP acceleration?

Thanks for any help!

b4rtaz commented 2 months ago

Hello @galenyu! Yes, GPU acceleration is planned.