b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.03k stars 69 forks source link

Assertion `d % nSlices == 0' failed. #26

Closed joelewing closed 2 months ago

joelewing commented 2 months ago

I'm inferencing a q40 weight of llama-3-70b-instruct across 3 x86_64 machines, and I'm getting this on my root node: main: src/transformer.cpp:17: MatmulSlice::MatmulSlice(FloatType, int, int, int): Assertion `d % nSlices == 0' failed.

Any suggestions?

b4rtaz commented 2 months ago

Hello @joelewing,

yes. Currently Distributed Llama supports only 2^n - 1 workers (1 root + 1 worker, 1 root + 3 workers, 1 root + 7 workers...).

joelewing commented 2 months ago

Yes, that was my issue. Fixed it by adding another worker. My bad for not reading the description in the project more closely.