THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型
Apache License 2.0
5.87k stars 401 forks source link

ValueError: model_parallel_size is inconsistent with prior configuration.We currently do not support changing model_parallel_size. #419

Open Hakan-Khenda opened 5 months ago

Hakan-Khenda commented 5 months ago

Traceback (most recent call last): File "/home/sagemaker-user/CogVLM/basic_demo/cli_demo_sat.py", line 162, in main() File "/home/sagemaker-user/CogVLM/basic_demo/cli_demo_sat.py", line 37, in main model, model_args = AutoModel.from_pretrained( File "/opt/conda/lib/python3.10/site-packages/sat/model/base_model.py", line 340, in from_pretrained return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, kwargs) File "/opt/conda/lib/python3.10/site-packages/sat/model/base_model.py", line 332, in from_pretrained_base model = get_model(args, model_cls, kwargs) File "/opt/conda/lib/python3.10/site-packages/sat/model/base_model.py", line 417, in get_model model = model_cls(args, params_dtype=params_dtype, kwargs) File "/home/sagemaker-user/CogVLM/utils/models/cogvlm_model.py", line 125, in init super().init(args, transformer=transformer, kw_args) File "/home/sagemaker-user/CogVLM/utils/models/cogvlm_model.py", line 104, in init self.add_mixin("eva", ImageMixin(args)) File "/home/sagemaker-user/CogVLM/utils/models/cogvlm_model.py", line 77, in init self.vit_model = EVA2CLIPModel(EVA2CLIPModel.get_args(vars(vit_args))) File "/home/sagemaker-user/CogVLM/utils/models/eva_clip_model.py", line 110, in init super().init(args, transformer=transformer, kwargs) File "/opt/conda/lib/python3.10/site-packages/sat/model/base_model.py", line 89, in init success = _simple_init(model_parallel_size=args.model_parallel_size) File "/opt/conda/lib/python3.10/site-packages/sat/arguments.py", line 322, in _simple_init if initialize_distributed(args): # first time init model parallel, print warning File "/opt/conda/lib/python3.10/site-packages/sat/arguments.py", line 500, in initialize_distributed raise ValueError('model_parallel_size is inconsistent with prior configuration.' ValueError: model_parallel_size is inconsistent with prior configuration.We currently do not support changing model_parallel_size.

I am encountering the above error while attempting to perform inference with the model I fine-tuned on a Captcha dataset with MP_SIZE 8 Per_Worker 8 WORLD_SIZE 8 setup. I have also completed the merge operation.

Akhim-yun commented 2 weeks ago

我也遇到了这个问题