Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.03k
stars
69
forks
source link
Will this awesome proj consider supporting GPU acceleration? #35
A very impressive job!
But it doesn't seem to support the use of GPU. Does the author consider developing code that supports GPU acceleration?
Any suggestions to migrate this project to CUDA/HIP acceleration?
Thanks for any help!