[REQUEST] Support for paged attention?

cnjsdfcy commented 11 months ago

Hi,

Will this project support paged-attention? https://vllm.ai/?

Thanks, Jason

cli99 commented 11 months ago

Hi @cnjsdfcy, unfortunately paged-attention support is not on the current todo list of the tool. Since paged-attention is essentially a caching behavior, it's hard to model or make assumption to estimate the system latency or throughput. The tool aims to provide lower-bound performance from the model's point of view (thus no assumption is made about serving system workload or its caching behavior). If you have ideas of how we can model paged-attention, please share. Happy to work on it together.

cnjsdfcy commented 11 months ago

Hi @cli99 , thanks for you reply. Looks like this kind of optimization needs more detailed system-level modeling.

Closing the ticket, thanks.

cli99 / llm-analysis

[REQUEST] Support for paged attention? #16