b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.02k stars 68 forks source link

support multi nvidia jetson agx orin? #79

Open WangFengtu1996 opened 1 month ago

WangFengtu1996 commented 1 month ago

support multi nvidia jetson agx orin?

b4rtaz commented 1 month ago

Only CPUs now.

kami4ka commented 1 month ago

Only CPUs now.

Having GPUs and TPUs support would be insanely appreciated.

zhengpeirong commented 1 month ago

Only CPUs now.

Having GPUs and TPUs support would be insanely appreciated.

https://github.com/ysyisyourbrother/Galaxy-LM There's a similar repo supporting GPU-only. e.g., Jetson.