abertsch72 / unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
MIT License
1.05k stars 77 forks source link

API server for unlimiformer #39

Open neubig opened 12 months ago

neubig commented 12 months ago

It'd be cool if it were possible to query Unlimiformer through an API similar to the OpenAI one. Would it be possible to create an API server for Unlimiformer-based models?

Reference: https://github.com/neulab/prompt2model/pull/344#discussion_r1320537075

cc: @abertsch72 , @coderpat

CoderPat commented 11 months ago

We could see if unlimiformer potentially could run in TGI. I think the core of the work would be modifying the architecture use flash-attention/vLLM whenever possible. @abertsch72 if this is something you wanna try, I'm happy to help!

abertsch72 commented 11 months ago

@CoderPat I'll reach out to you about this in the next few weeks!