b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.03k stars 69 forks source link

WebAssembly version #15

Closed pathquester closed 3 months ago

pathquester commented 3 months ago

Is it in the scope of the project to eventually provide a WebAssembly version?

b4rtaz commented 3 months ago

I don't know if distributed inference in the browser can make sense. Would you like to provide some context?