Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
Your distributed-llama is great! However, are there any clear instructions to set up the whole environment from scratch? I'm interested in your distributed-llama but I lack related knowledge in Raspberry Pi devices. So I don't even know how to connect the devices into my PC or install some dependencies in my PC or Raspberry Pi devices. Could you please help me with it? Thank you so much!
Hi Mr Bart,
Your distributed-llama is great! However, are there any clear instructions to set up the whole environment from scratch? I'm interested in your distributed-llama but I lack related knowledge in Raspberry Pi devices. So I don't even know how to connect the devices into my PC or install some dependencies in my PC or Raspberry Pi devices. Could you please help me with it? Thank you so much!