Reproduce the +test Unlimiformer setup

abertsch72 / unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"

MIT License

1.05k stars 77 forks source link

Reproduce the +test Unlimiformer setup #17

Closed Leonard907 closed 1 year ago

Leonard907 commented 1 year ago

Hi I want to reproduce the results of the +test Unlimiformer from the paper. Based on my understanding this setup does not require training, so is it possible to load an available checkpoint (like this) and convert it to Unlimiformer like the example demonstrated in inference-example.py? Are there any settings that I omitted here? Thanks!

urialon commented 1 year ago

Hi @Leonard907 , Thank you for your interest in our work!

Yes, to reproduce these experiments, follow this section of the README.

Specifically, you should take the main command line:

python src/run.py \
    src/configs/model/bart_base_sled.json 
    src/configs/training/base_training_args.json \
    src/configs/data/gov_report.json \
    --output_dir output_train_bart_base_local/ \
    --learning_rate 1e-5 \
    --model_name_or_path facebook/bart-base \
    --max_source_length 1024 \
    --eval_max_source_length 1024 --do_eval=True \
    --eval_steps 1000 --save_steps 1000 \
    --per_device_eval_batch_size 1 --per_device_train_batch_size 2 \
    --extra_metrics bertscore

and add --test_unlimiformer --eval_max_source_length 999999 --model_name_or_path abertsch/bart-base-govreport. Let us know if you have any issues or questions!

Best, Uri

Leonard907 commented 1 year ago

Thank you very much!

lzp870 commented 1 year ago

Hi @Leonard907 , Thank you for your interest in our work!

Yes, to reproduce these experiments, follow this section of the README.

Specifically, you should take the main command line:
python src/run.py \
    src/configs/model/bart_base_sled.json 
    src/configs/training/base_training_args.json \
    src/configs/data/gov_report.json \
    --output_dir output_train_bart_base_local/ \
    --learning_rate 1e-5 \
    --model_name_or_path facebook/bart-base \
    --max_source_length 1024 \
    --eval_max_source_length 1024 --do_eval=True \
    --eval_steps 1000 --save_steps 1000 \
    --per_device_eval_batch_size 1 --per_device_train_batch_size 2 \
    --extra_metrics bertscore
and add --test_unlimiformer --eval_max_source_length 999999 --model_name_or_path abertsch/bart-base-govreport. Let us know if you have any issues or questions!

Best, Uri

Hello, I take the main command line you listed, but get "srcIndex < srcSelectDimSize" issue, when I delete "--eval_max_source_length 999999", the issue is addressed, what should I do while using this command?

lzp870 commented 1 year ago

Hi @Leonard907 , Thank you for your interest in our work! Yes, to reproduce these experiments, follow this section of the README. Specifically, you should take the main command line:
python src/run.py \
    src/configs/model/bart_base_sled.json 
    src/configs/training/base_training_args.json \
    src/configs/data/gov_report.json \
    --output_dir output_train_bart_base_local/ \
    --learning_rate 1e-5 \
    --model_name_or_path facebook/bart-base \
    --max_source_length 1024 \
    --eval_max_source_length 1024 --do_eval=True \
    --eval_steps 1000 --save_steps 1000 \
    --per_device_eval_batch_size 1 --per_device_train_batch_size 2 \
    --extra_metrics bertscore
and add --test_unlimiformer --eval_max_source_length 999999 --model_name_or_path abertsch/bart-base-govreport. Let us know if you have any issues or questions! Best, Uri
Hello, I take the main command line you listed, but get "srcIndex < srcSelectDimSize" issue, when I delete "--eval_max_source_length 999999", the issue is addressed, what should I do while using this command?

is it necessary to set "use_ Datastore=True" ?

urialon commented 1 year ago

Hi @Leonard907 , It works for me, the only thing that was missing was adding --tokenizer_name facebook/bart-base, but we will add the tokenizer to the model so it won't be needed in the future.

Setting --use_datastore is useful with extremely long inputs, but it should work either way.

Can you try to (1) git pull the latest version, and (2) run the exact following command line (test only, no training):

python src/run.py \
    src/configs/model/bart_base_sled.json \
    src/configs/training/base_training_args.json \
    src/configs/data/gov_report.json \
    --output_dir output_train_bart_base_local/ \
    --learning_rate 1e-5 \
    --model_name_or_path facebook/bart-base \
    --max_source_length 1024 \
    --eval_max_source_length 999999 --do_eval=True --do_train=False \
    --eval_steps 1000 --save_steps 1000 \
    --per_device_eval_batch_size 1 --per_device_train_batch_size 2 \
    --extra_metrics bertscore --test_unlimiformer \
    --model_name_or_path abertsch/bart-base-govreport \
    --tokenizer facebook/bart-base

abertsch72 commented 1 year ago

Following up on the above: tokenizer is now be added to the model! It should now run without explicitly setting tokenizer_name

urialon commented 1 year ago

Closing due to inactivity, feel free to re-open or create a new issue if you have any questions or problems.