Efficient Streaming Language Models with Attention Sinks

Zhao-Dongyu / Zhao-Dongyu.github.io

0 stars 0 forks source link

Open Zhao-Dongyu opened 1 week ago

Zhao-Dongyu commented 1 week ago

Deploy LLMs for infinite-length inputs without sacrificing efficiency and performance.