Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.04k
stars
69
forks
source link
what(): The tokenizer does not include chat template #97
But I'm not able to run the same model on chat mode, as it throws the following error. Is there a way to get around this error so I can use this model on chat mode?
Hello, @b4rtaz!
I'm running distributed llama on a cluster composed of 1 Raspberry Pi 4B 8 Gb and 7 Raspberry Pi 4B 4 Gb.
I've successfully converted and ran model ajibawa-2023/Uncensored-Jordan-13B on inference mode, obtaining the following results.
But I'm not able to run the same model on chat mode, as it throws the following error. Is there a way to get around this error so I can use this model on chat mode?