flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving
https://flexflow.readthedocs.io
Apache License 2.0
1.59k stars 218 forks source link

Docs: typo in C++ serving example #1371

Closed chenzhuofu closed 2 months ago

chenzhuofu commented 2 months ago

The example command

./inference/spec_infer/spec_infer -ll:gpu 4 -ll:fsize 14000 -ll:zsize 30000 -llm-model meta-llama/Llama-2-7b-hf -ssm-model JackFram/llama-68m -prompt /path/to/prompt.json -tensor-parallelism-degree 4 --fusion

misses -ll:cpu 4 which will result in stuck before executing legion task background_serving_task.