multi-gpu unlimiformer training: Expected all tensors to be on the same device

shi-kejian commented 10 months ago

Hello again,

Thanks for your effort again

Running unlimiformer training on gov_report (your README standard finetuning with the unlimiformer flags added):

python src/run.py \
    src/configs/training/base_training_args.json \
    src/configs/data/gov_report.json \
    --output_dir output_train_bart_base_local/ \
    --learning_rate 1e-5 \
    --unlimiformer_training \
    --max_source_length 16384 \
    --test_unlimiformer  \
    --model_name_or_path facebook/bart-base \
    --max_source_length 1024 \
    --eval_max_source_length 999999 --do_eval=True \
    --eval_steps 1000 --save_steps 1000 \
    --per_device_eval_batch_size 1 --per_device_train_batch_size 2 \
    --extra_metrics bertscore

All other configs are default.

Multi-gpu setting gets me the following error, and I couldn't find a fix. However, single gpu works.

RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/home/miniconda3/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
    output = module(*input, **kwargs)
  File "/home/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/storage/home/research//unlimiformer/src/random_training_unlimiformer.py", line 163, in random_inputs_forward_hook
    self.long_inputs_encoded, self.long_inputs_mask = self.chunked_encode_input(input_ids=input_ids, attention_mask=attention_mask)
  File "/storage/home/research//unlimiformer/src/random_training_unlimiformer.py", line 195, in chunked_encode_input
    output = self.model.base_model.encoder(chunk, attention_mask=chunk_attention_mask, return_dict=True, output_hidden_states=True)
  File "/home/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/miniconda3/lib/python3.10/site-packages/transformers/models/bart/modeling_bart.py", line 818, in forward
    inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale
  File "/home/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/miniconda3/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/home/miniconda3/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)

I am curious do you have similar issues when running (latest main commit) .

Thank you!

urialon commented 10 months ago

Hi @shi-kejian , Thank you for your interest in our work!

We haven't tried training on more than one GPU.

According to your stack trace, maybe it would help to move chunk to the same GPU as the model, here: training_unlimiformer.py line 195

If you manage to get it to work, we would love to merge a PR.

Best, Uri

shi-kejian commented 10 months ago

Thank you. I'll try some tweaks. A quick comment: running run.py with --do_predict will throw the following error for transformers>=4.30.0; currently 4.34.0 Oct.15, 2023 Downgrading to 4.28.0 solved the problem. So it could be desirable to make this forward compatible.

Traceback (most recent call last):
File "/storage/home/unlimiformer/src/run.py", line 1180, in main() File "/storage/home/unlimiformer/src/run.py", line 837, in main trainer.args.predict_with_generate = True # during prediction, we don't have labels File "/home/miniconda3/lib/python3.10/site-packages/transformers/training_args.py", line 1712, in setattr raise FrozenInstanceError(f"cannot assign to field {name}") dataclasses.FrozenInstanceError: cannot assign to field predict_with_generate

urialon commented 10 months ago

So just to clarify - with 4.28.0 you managed to train on multiple GPUs?

shi-kejian commented 10 months ago

No. Sorry for confusion. It's not about multi-gpu. With transformers>=4.30.0 there will be error when running run.py with --do_predict

File "/storage/home/unlimiformer/src/run.py", line 837, in main trainer.args.predict_with_generate = True # during prediction, we don't have labels File "/home/miniconda3/lib/python3.10/site-packages/transformers/training_args.py", line 1712, in setattr raise FrozenInstanceError(f"cannot assign to field {name}") dataclasses.FrozenInstanceError: cannot assign to field predict_with_generate

Downgrading to 4.28.0 made --do_predict work. Thank you.

abertsch72 / unlimiformer

multi-gpu unlimiformer training: Expected all tensors to be on the same device #52