WukLab / preble

Stateful LLM Serving
Apache License 2.0
32 stars 4 forks source link

The keys that do not exist in the response chunk dictionary #74

Open ljjlovefree opened 2 months ago

ljjlovefree commented 2 months ago

The chunk dictionary of the response does not contain the ['meta_info']['arrival_time'] and ['meta_info']['append_to_queue_time'] keys. When the async_send_request function processes the response, Only the first chunk will be returned to the user. The symptom is that the user sees only one token output. patch: https://github.com/ljjlovefree/preble_1/commit/e5e117cfd2045d6253d544530be0a0cc6c598a42

vikranth22446 commented 3 weeks ago

Sorry for the late reply, but this is because we slightly modified sglang to get information like arrival_time. This requires using the custom version of sglang that I have setup.

There is a plan at some point to integrate the ideas into the sglang codebase