Open liutt1312 opened 8 months ago
Will using only CPU be faster than llama.cpp?
Compared with llama.cpp in CPU-only mode, yes. PowerInfer can reduce ~50% FLOPS end to end, depending on the model architecture and sparsity. So a ~2x speedup with CPU decoding would be expected.
Will using only CPU be faster than llama.cpp?