I encountered the following error while training with 'vicuna-13b-v1.1':
File "/root/MiniGPT-4/minigpt4/models/minigpt_base.py", line 41, in init
self.llama_model, self.llama_tokenizer = self.init_llm(
File "/root/MiniGPT-4/minigpt4/models/base_model.py", line 185, in init_llm
llama_model = LlamaForCausalLM.from_pretrained(
File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2881, in from_pretrained
) = cls._load_pretrained_model(
File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3278, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32001, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 5120]).
size mismatch for lm_head.weight: copying a param with shape torch.Size([32001, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 5120]).
You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 249178) of binary: /root/anaconda3/envs/minigptv/bin/python
Traceback (most recent call last):
File "/root/anaconda3/envs/minigptv/bin/torchrun", line 8, in
sys.exit(main())
File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
I encountered the following error while training with 'vicuna-13b-v1.1':
File "/root/MiniGPT-4/minigpt4/models/minigpt_base.py", line 41, in init self.llama_model, self.llama_tokenizer = self.init_llm( File "/root/MiniGPT-4/minigpt4/models/base_model.py", line 185, in init_llm llama_model = LlamaForCausalLM.from_pretrained( File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2881, in from_pretrained ) = cls._load_pretrained_model( File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3278, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32001, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 5120]). size mismatch for lm_head.weight: copying a param with shape torch.Size([32001, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 5120]). You may consider adding
sys.exit(main())
File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/envs/minigptv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
ignore_mismatched_sizes=True
in the modelfrom_pretrained
method. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 249178) of binary: /root/anaconda3/envs/minigptv/bin/python Traceback (most recent call last): File "/root/anaconda3/envs/minigptv/bin/torchrun", line 8, inMay I ask how to solve it? thanks