Open ljjlovefree opened 2 months ago
Sorry for the late reply, but this is because we slightly modified sglang to get information like arrival_time. This requires using the custom version of sglang that I have setup.
There is a plan at some point to integrate the ideas into the sglang codebase
The chunk dictionary of the response does not contain the ['meta_info']['arrival_time'] and ['meta_info']['append_to_queue_time'] keys. When the async_send_request function processes the response, Only the first chunk will be returned to the user. The symptom is that the user sees only one token output. patch: https://github.com/ljjlovefree/preble_1/commit/e5e117cfd2045d6253d544530be0a0cc6c598a42