abertsch72 / unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
MIT License
1.05k stars 77 forks source link

error while training #42

Closed kekekawaii2839 closed 11 months ago

kekekawaii2839 commented 11 months ago

Hi, I tried to train with inputs longer than 1024 on bart using the following command:

CUDA_VISIBLE_DEVICES=0 python src/run.py \
    src/configs/model/bart_base_sled.json \
    src/configs/training/base_training_args.json \
    src/configs/data/gov_report.json \
    --output_dir output_train_bart_base_local/ \
    --learning_rate 1e-5 \
    --model_name_or_path facebook/bart-base \
    --max_source_length 16384 \
    --eval_max_source_length 1024 --do_eval=True \
    --eval_steps 1000 --save_steps 1000 \
    --per_device_eval_batch_size 1 --per_device_train_batch_size 2 \
    --extra_metrics bertscore --unlimiformer_training

And I got a lot of error like this:

/opt/conda/conda-bld/pytorch_1682343962757/work/aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [334,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

But as long as max_source_length is smaller than 1024, I can train the model successfully. Any clues on that?

urialon commented 11 months ago

Hi @kekekawaii2839 , Thank you for your interest in our work.

I think we made a mistake in the command line, can you please try removing the flag src/configs/model/bart_base_sled.json ?

Also, since you are training with long context, it will make the most sense to also test with long inputs by adding --test_unlimiformer --eval_max_source_length 999999

Let me know how it goes. Best, Uri

kekekawaii2839 commented 11 months ago

Great! It works! Thanks for your wonderful work again!