Closed jqueguiner closed 9 months ago
Hi @jqueguiner ,
Thanks for your support. We are considering adding that feature to TensorRT-LLM but nothing concrete at this point. We are not ready to commit on a date when it'll be added (if ever).
Thanks, Julien
Hi @jqueguiner . StreamingLLM, a technique which takes advantage of Attention Sinks, has been added to the main branch!
Llama example. Take a look & let us know what you think!
Hi @jqueguiner . StreamingLLM, a technique which takes advantage of Attention Sinks, has been added to the main branch!
Llama example. Take a look & let us know what you think!
Hi @ncomly-nvidia Has H2O been supported in latest main branch? Thanks.
Hi 👋 and thanks for the amazing job can’t wait to see the developments in the next few weeks and months.
any plan to work on attention sink ?