Closed joelewing closed 2 months ago
Hello @joelewing,
yes. Currently Distributed Llama supports only 2^n - 1 workers (1 root + 1 worker, 1 root + 3 workers, 1 root + 7 workers...).
Yes, that was my issue. Fixed it by adding another worker. My bad for not reading the description in the project more closely.
I'm inferencing a q40 weight of llama-3-70b-instruct across 3 x86_64 machines, and I'm getting this on my root node:
main: src/transformer.cpp:17: MatmulSlice::MatmulSlice(FloatType, int, int, int): Assertion `d % nSlices == 0' failed.
Any suggestions?