deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution
Apache License 2.0
182 stars 59 forks source link

Plan to use Attention Sinks? #1470

Open spring1915 opened 6 months ago

spring1915 commented 6 months ago

Description

Do you intend to add Attention Sinks streaming as an alternative to the current implementations of streaming for huggingface, vllm and scheduler rolling back modes?

lanking520 commented 6 months ago

Thanks for your advice, we will take a look and respond here