THUDM / VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Apache License 2.0
4.07k stars 414 forks source link

调用finetune_224_lora.sh时报错please pass LOCAL_WORLD_SIZE environment variable. #298

Open xuxuxuchen opened 10 months ago

xuxuxuchen commented 10 months ago

已经在.deepspeed_env中加入了 LOCAL_WORLD_SIZE 的环境变量,但是模型中依然报错,请问怎么解决呢?

1049451037 commented 10 months ago


xuxuxuchen commented 10 months ago

.deepspeed_env放在CogVLM-main文件夹下 写了:SAT_HOME=~/.sat_models LOCAL_WORLD_SIZE=8

报错信息为:File "/work/home/CogVLM-main/", line 239, in model, args = FineTuneTrainCogVLMModel.from_pretrained(args.from_pretrained, args, overwrite_args={'model_parallel_size': args.model_parallel_size} if args.model_parallel_size != 1 else {}) File "/opt/conda/lib/python3.10/site-packages/sat/model/", line 220, in from_pretrained local_rank = get_node_rank() File "/opt/conda/lib/python3.10/site-packages/sat/mpu/", line 144, in get_node_rank return torch.distributed.get_rank(group=get_node_group()) File "/opt/conda/lib/python3.10/site-packages/sat/mpu/", line 122, in get_node_group assert _NODE_GROUP is not None, \ AssertionError: node group is not initialized, please pass LOCAL_WORLD_SIZE environment variable.

xuxuxuchen commented 10 months ago


1049451037 commented 10 months ago


1049451037 commented 10 months ago


git clone
cd SwissArmyTransformer
pip install .


xuxuxuchen commented 10 months ago

重装之后出现这个报错:TypeError: BaseFileLock.init() got an unexpected keyword argument 'mode' model, args = FineTuneTrainCogVLMModel.from_pretrained(args.from_pretrained, args, overwrite_args={'model_parallel_size': args.model_parallel_size} if args.model_parallel_size != 1 else {}) File "/opt/conda/lib/python3.10/site-packages/sat/model/", line 219, in from_pretrained model, model_args = cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=True, overwrite_args=overwrite_args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/sat/model/", line 201, in from_pretrained_base model_path = auto_create(name, path=home_path, url=url) File "/opt/conda/lib/python3.10/site-packages/sat/resources/", line 50, in auto_create lock = FileLock(model_path + '.lock', mode=0o777) TypeError: BaseFileLock.init() got an unexpected keyword argument 'mode' 如果将mode=0o777删掉,模型运行,继续报LOCAL_WORLD_SIZE 的错

1049451037 commented 10 months ago

pip install -U filelock


1049451037 commented 10 months ago


xuxuxuchen commented 10 months ago

1.好的好的下次不会随便自作聪明动代码了。 2.pip install -U filelock之后运行继续报LOCAL_WORLD_SIZE 的错。 3.怎么重新下载模型呢,因为服务器没法用gitclone,我刚才是在git上下了sat的zip,解压后在命令行里运行pip install .的,运行之前还pip uninstall 原来的,现在pip list中显示SwissArmyTransformer 0.4.8。

1049451037 commented 10 months ago


git clone
cd SwissArmyTransformer
pip install .
1049451037 commented 10 months ago

xuxuxuchen commented 10 months ago

1.好的,那我去找一下替代gitclone的办法。 2.我看到您分享的这个链接里的line84-91依然有:os.environ.get('LOCAL_WORLD_SIZE', None)这个环境变量呀。而且我的报错是line126的assert“'node group is not initialized, please pass LOCAL_WORLD_SIZE environment ”,其实并不和LOCAL_WORLD_SIZE直接相关。 guess_local_world_size = world_size if world_size < 8 else 8 local_world_size = os.environ.get('LOCAL_WORLD_SIZE', None) if local_world_size is None: local_world_size = guess_local_world_size print_rank0(f"You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE={guess_local_world_size}. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.") local_world_size = int(local_world_size)

Build the node groups.

global _NODE_GROUP

line126 assert _NODE_GROUP is not None, \ 'node group is not initialized, please pass LOCAL_WORLD_SIZE environment variable.' return _NODE_GROUP

1049451037 commented 10 months ago


xuxuxuchen commented 10 months ago

您好,不好意思又来打扰您了。 因为服务器上的确没办法用gitclone,所以我在本地gitclone了之后复制到服务器里再用的pip install .,但是还是报一模一样的错,依然是AssertionError: node group is not initialized, please pass LOCAL_WORLD_SIZE environment variable.,请问您能帮我想想还有什么其他可能导致错误的原因吗?

1049451037 commented 10 months ago


xuxuxuchen commented 10 months ago
