abertsch72 / unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
MIT License
1.05k stars 77 forks source link

About adding a prefix and input length #47

Closed apapoudakis closed 7 months ago

apapoudakis commented 11 months ago

Hello and thank you for this great work!

1) Is it possible to add the same prefix in front of every chunk? For instance, as you mention in #20 for a QA task we want to add the question before every chunk. Do we need to make any other changes to this codebase or just use the input_prefix_column argument?

2) Have you tried also using models which can process inputs longer than 4k?

urialon commented 11 months ago

Hi @apapoudakis , Thank you for your interest in our work!

  1. If I'm not mistaken, if you have an input_prefix_column set, and you also use the flags --max_prefix_length 64 and --pad_prefix=True, the prefix column will also be automatically tokenized and added to the tokenized input.

But please verify it, for example, by decoding the input right before the call to train(..) or evaluate(..).

  1. We haven't tried

Please let us know if you have any more questions! Cheers, Uri

apapoudakis commented 10 months ago

Thank you for your response!

By decoding the input, I think that the input_prefix_columns adds the prefix only at the start of the input (only the first chunk). Probably for qa tasks the prefix should be added as in SLED code where the prepend_prefix argument is used for that.

Also, I would like to ask if you have tried to compute eval_loss during training, as when I'm trying to add the labels in evaluation I'm facing memory issues. Did you have any similar problems?

urialon commented 10 months ago

I think that the input_prefix_columns adds the prefix only at the start of the input (only the first chunk).

Right, that's probably correct.

Probably for qa tasks the prefix should be added as in SLED code where the prepend_prefix argument is used for that.

That's correct, we haven't tried that, but I agree that it may lead to even higher gains.

Also, I would like to ask if you have tried to compute eval_loss during training

We haven't computed the eval_loss, because we used predict_with_generate https://github.com/abertsch72/unlimiformer/blob/main/src/run.py#L786C58-L786C79 . So we only looked at the validation set's ROUGE and BERTScore during development, hyperparameter tuning, etc.

Let us know if you have any questions! Uri