Open shi-kejian opened 10 months ago
Hi @shi-kejian , Thank you for your interest in our work!
We haven't tried training on more than one GPU.
According to your stack trace, maybe it would help to move chunk
to the same GPU as the model, here:
training_unlimiformer.py line 195
If you manage to get it to work, we would love to merge a PR.
Best, Uri
Thank you. I'll try some tweaks. A quick comment: running run.py with --do_predict will throw the following error for transformers>=4.30.0; currently 4.34.0 Oct.15, 2023 Downgrading to 4.28.0 solved the problem. So it could be desirable to make this forward compatible.
Traceback (most recent call last):
File "/storage/home/unlimiformer/src/run.py", line 1180, in
So just to clarify - with 4.28.0
you managed to train on multiple GPUs?
No. Sorry for confusion. It's not about multi-gpu.
With transformers>=4.30.0 there will be error when running run.py with --do_predict
File "/storage/home/unlimiformer/src/run.py", line 837, in main trainer.args.predict_with_generate = True # during prediction, we don't have labels File "/home/miniconda3/lib/python3.10/site-packages/transformers/training_args.py", line 1712, in setattr raise FrozenInstanceError(f"cannot assign to field {name}") dataclasses.FrozenInstanceError: cannot assign to field predict_with_generate
Downgrading to 4.28.0
made --do_predict work.
Thank you.
Hello again,
Thanks for your effort again
Running unlimiformer training on gov_report (your README standard finetuning with the unlimiformer flags added):
All other configs are default.
Multi-gpu setting gets me the following error, and I couldn't find a fix. However, single gpu works.
I am curious do you have similar issues when running (latest main commit) .
Thank you!