Closed Foehnc closed 11 months ago
请问下是单独使用这行python文件还是使用了accelerate库,如https://github.com/RUCAIBox/TextBox/blob/2.0.0/asset/efficient_training.md
你好,应该是版本的问题,accelerate中GradientState 删除了这个函数,你可以回退到accelerate 0.15.0版本。如果你不使用accelerate库的话,可以把删除代码
self.accelerator.gradient_state._set_end_of_dataloader(False)
描述这个 bug 按照install.sh配置环境,accelerate使用0.23.0会有以下报错:
File "/home/workspace/TextBox/textbox/utils/dashboard.py", line 311, in new_experiment yield True File "/home/workspace/TextBox/textbox/quick_start/experiment.py", line 140, in run self._do_train_and_valid() File "/home/workspace/TextBox/textbox/quick_start/experiment.py", line 115, in _do_train_and_valid self.valid_result = self.trainer.fit(train_data, valid_data) File "/home/workspace/TextBox/textbox/trainer/trainer.py", line 452, in fit loss = self._train_epoch(train_data, epoch_idx, valid_data)['loss'] File "/home/workspace/TextBox/textbox/trainer/trainer.py", line 236, in _train_epoch self.accelerator.gradient_state._set_end_of_dataloader(False) AttributeError: 'GradientState' object has no attribute '_set_end_of_dataloader'
安装旧版本accelerate 0.20.3(与要求的环境匹配的最低版本)依然会报这个错。如何复现
python run_textbox.py \ --use_gpu=True \ --gpu_id=1 \ --model=Chinese-BART \ --model_path=pretrained_models/bart-base-chinese \ --dataset=csl \ --do_train=True \ --do_valid=True \ --do_test=True \ --epochs=5 \ --train_batch_size=32 \ --eval_batch_size=32 \ --max_save=0 \ --valid_strategy=epoch \ --valid_steps=1 \ --filename=DEBUG \ --wandb=disabled \