Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.02k
stars
68
forks
source link
fix: chunked stream, close stream without econnreset. #65
Still very buggy, but now I'm able to connect to the
dllama-api
by using Anything LLM client.