Open EntusiastaIApy opened 3 months ago
Hello @EntusiastaIApy,
I think the problem is that: "num_attention_heads": 52
The current implementation expects that this number can be divided by the number of nodes without remainder.
52 / 8 => 6 remainder 4
This is basically a bug.
I am facing a similar kind of issue. I am trying to run TinyLlama in the dllama environment. I am using 2 worker nodes of 8 GB ram each but it throws a similar kind of error.
@Different-Pranav you are using 3 nodes (root + 2 workers). You should try with 2 nodes (1 root + 1 worker) or 4 nodes (1 root + 3 workers).
Hello, @b4rtaz!
I'm trying to run model nkpz/llama2-22b-chat-wizard-uncensored on a cluster composed of 1 Raspberry Pi 4B 8 Gb and 7 Raspberry Pi 4B 4 Gb, but, both on inference and chat modes, distributed llama throws the following error. Do you know why this is happening and how to fix it?