flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving
https://flexflow.readthedocs.io
Apache License 2.0
1.6k stars 219 forks source link

Specscheduler new attention #1430

Closed chenzhuofu closed 1 week ago

chenzhuofu commented 2 weeks ago

Description of changes:

Adopt flashinfer into self-attention.


This change is Reviewable