08/27 17:57:22 - mmengine - INFO - Resume checkpoint from /root/InternLM/work_dir/internvl_ft_trafficsign_multiround/iter_6000.pth
Traceback (most recent call last):
File "/root/InternLM/code/XTuner/xtuner/tools/train.py", line 360, in
main()
File "/root/InternLM/code/XTuner/xtuner/tools/train.py", line 356, in main
runner.train()
File "/root/.conda/envs/demo/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1195, in train
self.load_or_resume()
File "/root/.conda/envs/demo/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1141, in load_or_resume
self.resume(resume_from)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1456, in resume
checkpoint = self.strategy.resume(
File "/root/InternLM/code/XTuner/xtuner/engine/_strategy/deepspeed.py", line 60, in resume
checkpoint = super().resume(*args, **kwargs)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/mmengine/strategy/deepspeed.py", line 472, in resume
, extra_ckpt = self.model.load_checkpoint(
File "/root/.conda/envs/demo/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2759, in load_checkpoint
load_path, client_states = self._load_checkpoint(load_dir,
File "/root/.conda/envs/demo/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2809, in _load_checkpoint
sd_loader = SDLoaderFactory.get_sd_loader(ckpt_list, checkpoint_engine=self.checkpoint_engine)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 43, in get_sd_loader
return MegatronSDLoader(ckpt_list, version, checkpoint_engine)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 193, in init
super().init(ckpt_list, version, checkpoint_engine)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 55, in init
self.check_ckpt_list()
File "/root/.conda/envs/demo/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 168, in check_ckpt_list
assert len(self.ckpt_list) > 0
AssertionError
08/27 17:57:22 - mmengine - INFO - Resume checkpoint from /root/InternLM/work_dir/internvl_ft_trafficsign_multiround/iter_6000.pth Traceback (most recent call last): File "/root/InternLM/code/XTuner/xtuner/tools/train.py", line 360, in
main()
File "/root/InternLM/code/XTuner/xtuner/tools/train.py", line 356, in main
runner.train()
File "/root/.conda/envs/demo/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1195, in train
self.load_or_resume()
File "/root/.conda/envs/demo/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1141, in load_or_resume
self.resume(resume_from)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1456, in resume
checkpoint = self.strategy.resume(
File "/root/InternLM/code/XTuner/xtuner/engine/_strategy/deepspeed.py", line 60, in resume
checkpoint = super().resume(*args, **kwargs)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/mmengine/strategy/deepspeed.py", line 472, in resume
, extra_ckpt = self.model.load_checkpoint(
File "/root/.conda/envs/demo/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2759, in load_checkpoint
load_path, client_states = self._load_checkpoint(load_dir,
File "/root/.conda/envs/demo/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2809, in _load_checkpoint
sd_loader = SDLoaderFactory.get_sd_loader(ckpt_list, checkpoint_engine=self.checkpoint_engine)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 43, in get_sd_loader
return MegatronSDLoader(ckpt_list, version, checkpoint_engine)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 193, in init
super().init(ckpt_list, version, checkpoint_engine)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 55, in init
self.check_ckpt_list()
File "/root/.conda/envs/demo/lib/python3.10/site-packages/deepspeed/runtime/state_dict_factory.py", line 168, in check_ckpt_list
assert len(self.ckpt_list) > 0
AssertionError