flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving
https://flexflow.readthedocs.io
Apache License 2.0
1.67k stars 224 forks source link

Spec infer C++ examples blocked #1319

Closed reyna-abhyankar closed 6 months ago

reyna-abhyankar commented 7 months ago

Specifically, https://github.com/flexflow/FlexFlow/blob/0d75c1042bf87e45684bcb3679cfc9f39a87e589/src/runtime/request_manager.cc#L314

This may be a background server issue, since the serve_xxxx() functions aren't called.

This is for running spec_infer.cc and incr_decoding.cc examples.

jiazhihao commented 7 months ago

What's the command line you used?

reyna-abhyankar commented 7 months ago

What's the command line you used?

./FlexFlow/build/inference/spec_infer/spec_infer -ll:gpu 1 -ll:fsize 32000 -ll:zsize 14000 -llm-model facebook/opt-6.7b -ssm-model facebook/opt-125m -prompt ./data/chatbot_short.json -output-file test_output

goliaro commented 6 months ago

@reyna-abhyankar, you need to add the flag -ll:cpu 4. We should update the docs. Let me know if you are still encountering the issue with the CPU flag.

goliaro commented 6 months ago

Feel free to reopen if the issue is not fixed, closing for now after our discussion on Slack