flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving
https://flexflow.readthedocs.io
Apache License 2.0
1.59k stars 218 forks source link

Split prefilling batch with decoding batch for increamental decoding. #1345

Closed zwang86 closed 1 month ago

zwang86 commented 3 months ago

Description of changes:

Related Issues:

Linked Issues:

Issues closed by this PR:


This change is Reviewable

jiazhihao commented 1 month ago

@zwang86 @zikun-li Has this already been merged to the spec-scheduler branch?

zwang86 commented 1 month ago

@jiazhihao This pr is out of date, we can close this now.