Closed rikoras closed 6 months ago
Actually this is because of our sparse down operator in FFN. We utilize axpy to implement a matmul operator. In this process, the output is composed of many concurrent add operator, which will introduce slight fluctuation. For a stable output, it's advised to use PowerInfer with pure CPU inference using a single thread.
Actually this is because of our sparse down operator in FFN. We utilize axpy to implement a matmul operator. In this process, the output is composed of many concurrent add operator, which will introduce slight fluctuation. For a stable output, it's advised to use PowerInfer with pure CPU inference using a single thread.
That makes it very clear! Thanks!
Prerequisites
Before submitting your issue, please ensure the following:
Problem description
I am conducting a series of performance analyses on
PowerInfer
. Out of consideration of stability, I need to obtain the same output after each execution. I have refered to https://github.com/SJTU-IPADS/PowerInfer/issues/109, but it is not working.Command
./main -m ../../models/llama-re-lu-7b-sparse/llama-7b-re-lu.powerinfer.gguf --temp 0 -n 256 --seed 0 -t 8 --top-k 1 -p "Here is a code to calculate the first 20 primes"
Current behaviour
For the second execution with the same former part, I got different output text:
I wonder if the predictors have an effect on sampling.
Environment
This inconsistent does NOT appear on another device with: