File "train.py", line 14, in <module>
cli_main()
File "/data1/mzlv/g-transformer/fairseq_cli/train.py", line 347, in cli_main
cli_main_helper(args)
File "/data1/mzlv/g-transformer/fairseq_cli/train.py", line 374, in cli_main_helper
fn=distributed_main, args=(args,), nprocs=args.distributed_world_size
File "/home/mzlv/anaconda3/envs/gtrans_test_cu10/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not[ spawn_context.join():](https://wx.qq.com/cgi-bin/mmwebwx-bin/webwxcheckurl?requrl=http%3A%2F%2Fspawn_context.join()%3A&skey=%40crypt_fe4869f3_6be60a4be54a35ffb40698e1efe786d6&deviceid=e134100181037250&pass_ticket=undefined&opcode=2&scene=1&username=@b86617e379dbf55d07952dadda5d9f6098165536321e99eb1687bed0ccd16120)
File "/home/mzlv/anaconda3/envs/gtrans_test_cu10/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/mzlv/anaconda3/envs/gtrans_test_cu10/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/data1/mzlv/g-transformer/fairseq_cli/train.py", line 336, in distributed_main
main(args, init_distributed=True)
File "/data1/mzlv/g-transformer/fairseq_cli/train.py", line 111, in main
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
File "/data1/mzlv/g-transformer/fairseq/checkpoint_utils.py", line 138, in load_checkpoint
strict=not getattr(args, "load_partial", False)
File "/data1/mzlv/g-transformer/fairseq/trainer.py", line 334, in load_checkpoint
self.optimizer.load_state_dict(last_optim_state, optimizer_overrides)
File "/data1/mzlv/g-transformer/fairseq/optim/fairseq_optimizer.py", line 72, in load_state_dict
self.optimizer.load_state_dict(state_dict)
File "/home/mzlv/anaconda3/envs/gtrans_test_cu10/lib/python3.6/site-packages/torch/optim/optimizer.py", line 111, in load_state_dict
raise ValueError("loaded state dict has a different number of "
ValueError: loaded state dict has a different number of parameter groups
篇章级微调的时候加载句子级模型是正常的。如果微调中断,加载checkpoint_last.pt会报以上错误,是否g-transformer微调的时候中断只能从头训练,还是有额外的参数需要指定?