abertsch72 / unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
MIT License
1.06k stars 80 forks source link

API server for unlimiformer #39

Open neubig opened 1 year ago

neubig commented 1 year ago

It'd be cool if it were possible to query Unlimiformer through an API similar to the OpenAI one. Would it be possible to create an API server for Unlimiformer-based models?

Reference: https://github.com/neulab/prompt2model/pull/344#discussion_r1320537075

cc: @abertsch72 , @coderpat

CoderPat commented 1 year ago

We could see if unlimiformer potentially could run in TGI. I think the core of the work would be modifying the architecture use flash-attention/vLLM whenever possible. @abertsch72 if this is something you wanna try, I'm happy to help!

abertsch72 commented 1 year ago

@CoderPat I'll reach out to you about this in the next few weeks!