Open neubig opened 1 year ago
We could see if unlimiformer potentially could run in TGI. I think the core of the work would be modifying the architecture use flash-attention/vLLM whenever possible. @abertsch72 if this is something you wanna try, I'm happy to help!
@CoderPat I'll reach out to you about this in the next few weeks!
It'd be cool if it were possible to query Unlimiformer through an API similar to the OpenAI one. Would it be possible to create an API server for Unlimiformer-based models?
Reference: https://github.com/neulab/prompt2model/pull/344#discussion_r1320537075
cc: @abertsch72 , @coderpat