Zhao-Dongyu / Zhao-Dongyu.github.io

0 stars 0 forks source link

Efficient Streaming Language Models with Attention Sinks | Zhao Dongyu's Blog #19

Open Zhao-Dongyu opened 1 week ago

Zhao-Dongyu commented 1 week ago

https://zhao-dongyu.github.io/2024/11/04/108_streaming-llm/

Deploy LLMs for infinite-length inputs without sacrificing efficiency and performance.