Closed chenzhuofu closed 2 months ago
The example command
./inference/spec_infer/spec_infer -ll:gpu 4 -ll:fsize 14000 -ll:zsize 30000 -llm-model meta-llama/Llama-2-7b-hf -ssm-model JackFram/llama-68m -prompt /path/to/prompt.json -tensor-parallelism-degree 4 --fusion
misses -ll:cpu 4 which will result in stuck before executing legion task background_serving_task.
-ll:cpu 4
background_serving_task
The example command
misses
-ll:cpu 4
which will result in stuck before executing legion taskbackground_serving_task
.