Is it able to change the base model ?

abertsch72 / unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"

MIT License

1.05k stars 79 forks source link

Is it able to change the base model ? #10

Closed thangnm99 closed 1 year ago

thangnm99 commented 1 year ago

Thanks for your work. I have skimmed the source and paper and see that only a few base models are supported (all are seq2seq models). Now I have a question that can I replace the seq2seq model with a sequence classify model or token classify model (from hugging face like Bert, Roberta)?

urialon commented 1 year ago

Hi @thangnm99 , Thank you for your interest in our work!

The main novelty in our work is the modification of the cross-attention in seq2seq models (which can be applied to the causal attention in decoder-only models as well).

Our code currently does not support BERT-like models, but in principle, the same ideas can be applied to BERT's self-attention. We currently don't have the capacity to implement that, but it would be great to have contributions from the community.

Best, Uri

9au5a commented 11 months ago

Hi :-)

First, thank you again for the great work and for publishing the code.

Question regarding this issue: How do you estimate the time needed to implement support for BERT-like (encoder-only) models? Is it done in a few days, or more like 3-4 weeks?

I want to perform some classification and maybe regression tasks on long-range texts & and I would like to evaluate, among others, Unlimiformer for this. The classification should be okay-ish with seq2seq, although, for regression, it could be better. ;-)

I'm asking about the estimated time needed because I have time restrictions myself. However, with appropriate support, I would love to contribute to our project since it has enormous potential and could help me a lot.

Best, Paula

urialon commented 11 months ago

Hi @9au5a , Thank you for your interest in our work!

It's hard to estimate, and depends on your experience with tensors and building models...

I suggest that you look at the UnlimiformerLLama https://github.com/abertsch72/unlimiformer/blob/main/src/unlimiformer.py#L1015 and UnlimiformerBart https://github.com/abertsch72/unlimiformer/blob/main/src/unlimiformer.py#L843 classes to estimate the needed efforts.

Let us know if you have any questions. If it eventually works, we would love to adopt your contributions.

Best, Uri