flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving
https://flexflow.readthedocs.io
Apache License 2.0
1.7k stars 226 forks source link

[BUG] `llm.generate` returns `cdata` type #994

Closed brianyu-nexusflowai closed 1 year ago

brianyu-nexusflowai commented 1 year ago

Hey FlexFlow team!

For some reason the output of llm.generate is a cffi struct with a single attribute impl of type void *. I can see generation in the RequestManager log, but I can't access the return value.

Using the example from https://flexflow.readthedocs.io/en/latest/serve_overview.html#incremental-decoding with the zero_copy_memory_per_gpu fixed to zero_copy_memory_per_node:

...
[0 - 7fc9b0859740]  266.021465 {3}{RequestManager}: Output token is: 4512
[0 - 7fc9b0859740]  266.048439 {3}{RequestManager}: Output token is: 363
[0 - 7fc9b0859740]  266.048460 {3}{RequestManager}: [Done] guid(1000000) final_length(128)
[0 - 7fc9b0859740]  266.048842 {3}{RequestManager}: Final output:  ⁇  Here are some travel tips for Tokyo:
Tokyo is a city of neighborhoods, so it’s best to know where you’re going before you go.
Public transportation is amazing, so take advantage of it.
Japanese culture is very different from Western culture, so be respectful and be prepared to be a little uncomfortable.
Most importantly, have fun!
Posted in Travel, Uncategorized | Tagged Japan, Tokyo, Travel, Travel Tips      | Leave a reply
Money Matters: Tips for
[0 - 7fc9b0859740]  266.048872 {3}{RequestManager}: [Profile] guid(1000000) decoding_steps(118) start(262980704.0) finish(266048864.0) latency(3068160.0) acc_latency(3068160.0)
>>> result
<cdata 'flexflow_generation_result_t' owning 8 bytes>
>>> result.impl
<cdata 'void *' 0x7ffdb92aeba0>

Another unrelated question: what's the significance of the in the output?

Thanks!

jiazhihao commented 1 year ago

I will push a fix for this issue.

Another unrelated question: what's the significance of the ⁇ in the output?

That the BOS token. We will remove that token in RequestManager's log.

jiazhihao commented 1 year ago

The issue should have been fixed.