abertsch72 / unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
MIT License
1.05k stars 77 forks source link

support other llms? #34

Closed chaunceyliu30 closed 7 months ago

chaunceyliu30 commented 1 year ago

Is it possible to support other llms that performs better on chinese, like qwen-7b-chat or chatglm2-6b? Or is it possible to give instructions on how to do so? thank you :)

urialon commented 1 year ago

Hi @chaunceyliu30 , Thank you for your interest in our work!

You basically need to create a class like UnlimiformerLlama: https://github.com/abertsch72/unlimiformer/blob/main/src/unlimiformer.py#L1015

Which is customized for the specific names of layers in your desired architecture. Then tell Unlimiformer to use your custom class when for the specific architecture here:

https://github.com/abertsch72/unlimiformer/blob/main/src/unlimiformer.py#L794

However, as a first step, I think it would be easiest for you to find a Llama2-based model that was trained specifically for Chinese. As long as the model uses the same Llama architecture, you won't need to make any modification.

Let us know if you have any questions.

Best, Uri

mczhuge commented 12 months ago

Hi, I have a quick question: Does this repo use pertained unlimiformer's weight when support LLAMA models? Or it just uses LLAMA and faiss to build the index and then generate? Thanks!

urialon commented 12 months ago

Hi @mczhuge , Thank you for your interest in our work!

In the Llama models, we currently use the base Llama, no special training is needed!

Best, Uri