Open Tian14267 opened 1 year ago
调整成max_len = 1024,也同样会out of memory
需要8张A100,可以参考这个,int8也可以精调了 https://github.com/yangzhipeng1108/moss-finetune-and-moss-finetune-int8
@yangzhipeng1108 我想问下,每张A100的卡,能加载完整的模型吗?如果可以的话,为啥4张不行呀。不太明白~ 另外,int8也可以finetune吗?我试试
deepspeed_config: zero_stage: 3 moss的配置是deepspeed stage3 模型是分开加载到跟个模型,我使用1batch_size,8卡的瞬时基本每张卡都到70G显存以上了,官网没有开源,我改好,目前int8可以finetune
@yangzhipeng1108 明白你的意思。真正训练的话,还是得模型分散加载到各个卡里是吧。这个MOSS也没开源。不知道大神你有没有能实现这样的。目前只能用int8。
只要8张A100或者A800就能官方的实现finetune,官方finetune后16B的半精度推理速度太慢才考虑8int,我的项目里8int可以finetune
@yangzhipeng1108 嗯嗯。对了,我在训练int8的时候,遇到一些问题:run_int8_acc.sh
Traceback (most recent call last):
File "finetune_moss_int8_acc.py", line 319, in <module>
train(args)
File "finetune_moss_int8_acc.py", line 177, in train
model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path, trust_remote_code=True, use_cache=False)
File "/opt/conda/envs/moss/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 458, in from_pretrained
return model_class.from_pretrained(
File "/opt/conda/envs/moss/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2276, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/local/modeling_moss.py", line 608, in __init__
self.quantize(config.wbits, config.groupsize)
File "/root/.cache/huggingface/modules/transformers_modules/local/modeling_moss.py", line 732, in quantize
from .quantization import quantize_with_gptq
File "/root/.cache/huggingface/modules/transformers_modules/local/quantization.py", line 8, in <module>
from .custom_autotune import *
ModuleNotFoundError: No module named 'transformers_modules.local.custom_autotune'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 39133) of binary: /opt/conda/envs/moss/bin/python3.8
Traceback (most recent call last):
File "/opt/conda/envs/moss/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/conda/envs/moss/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/opt/conda/envs/moss/lib/python3.8/site-packages/accelerate/commands/launch.py", line 909, in launch_command
multi_gpu_launcher(args)
File "/opt/conda/envs/moss/lib/python3.8/site-packages/accelerate/commands/launch.py", line 604, in multi_gpu_launcher
distrib_run.run(args)
File "/opt/conda/envs/moss/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/opt/conda/envs/moss/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/envs/moss/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
请问你知道是啥问题嘛
moss 没法把custom_autotune加载到/root/.cache/huggingface/modules/transformers_modules/local/ 下面,你手动把custom_autotune复制到/root/.cache/huggingface/modules/transformers_modules/local/即可
@yangzhipeng1108 哈喽你好,我这边一直提示下面这个错误:
size mismatch for transformer.h.31.attn.qkv_proj.qweight: copying a param with shape torch.Size([1536, 18432]) from checkpoint, the shape in current model is torch.Size([768, 18432]).
size mismatch for transformer.h.31.attn.qkv_proj.qzeros: copying a param with shape torch.Size([48, 4608]) from checkpoint, the shape in current model is torch.Size([48, 2304]).
size mismatch for transformer.h.31.mlp.fc_in.qweight: copying a param with shape torch.Size([1536, 24576]) from checkpoint, the shape in current model is torch.Size([768, 24576]).
size mismatch for transformer.h.31.mlp.fc_in.qzeros: copying a param with shape torch.Size([48, 6144]) from checkpoint, the shape in current model is torch.Size([48, 3072]).
size mismatch for transformer.h.31.mlp.fc_out.qweight: copying a param with shape torch.Size([6144, 6144]) from checkpoint, the shape in current model is torch.Size([3072, 6144]).
size mismatch for transformer.h.31.mlp.fc_out.qzeros: copying a param with shape torch.Size([192, 1536]) from checkpoint, the shape in current model is torch.Size([192, 768]).
size mismatch for transformer.h.32.attn.out_proj.qweight: copying a param with shape torch.Size([1536, 6144]) from checkpoint, the shape in current model is torch.Size([768, 6144]).
size mismatch for transformer.h.32.attn.out_proj.qzeros: copying a param with shape torch.Size([48, 1536]) from checkpoint, the shape in current model is torch.Size([48, 768]).
size mismatch for transformer.h.32.attn.qkv_proj.qweight: copying a param with shape torch.Size([1536, 18432]) from checkpoint, the shape in current model is torch.Size([768, 18432]).
size mismatch for transformer.h.32.attn.qkv_proj.qzeros: copying a param with shape torch.Size([48, 4608]) from checkpoint, the shape in current model is torch.Size([48, 2304]).
size mismatch for transformer.h.32.mlp.fc_in.qweight: copying a param with shape torch.Size([1536, 24576]) from checkpoint, the shape in current model is torch.Size([768, 24576]).
size mismatch for transformer.h.32.mlp.fc_in.qzeros: copying a param with shape torch.Size([48, 6144]) from checkpoint, the shape in current model is torch.Size([48, 3072]).
size mismatch for transformer.h.32.mlp.fc_out.qweight: copying a param with shape torch.Size([6144, 6144]) from checkpoint, the shape in current model is torch.Size([3072, 6144]).
size mismatch for transformer.h.32.mlp.fc_out.qzeros: copying a param with shape torch.Size([192, 1536]) from checkpoint, the shape in current model is torch.Size([192, 768]).
size mismatch for transformer.h.33.attn.out_proj.qweight: copying a param with shape torch.Size([1536, 6144]) from checkpoint, the shape in current model is torch.Size([768, 6144]).
size mismatch for transformer.h.33.attn.out_proj.qzeros: copying a param with shape torch.Size([48, 1536]) from checkpoint, the shape in current model is torch.Size([48, 768]).
size mismatch for transformer.h.33.attn.qkv_proj.qweight: copying a param with shape torch.Size([1536, 18432]) from checkpoint, the shape in current model is torch.Size([768, 18432]).
size mismatch for transformer.h.33.attn.qkv_proj.qzeros: copying a param with shape torch.Size([48, 4608]) from checkpoint, the shape in current model is torch.Size([48, 2304]).
size mismatch for transformer.h.33.mlp.fc_in.qweight: copying a param with shape torch.Size([1536, 24576]) from checkpoint, the shape in current model is torch.Size([768, 24576]).
size mismatch for transformer.h.33.mlp.fc_in.qzeros: copying a param with shape torch.Size([48, 6144]) from checkpoint, the shape in current model is torch.Size([48, 3072]).
size mismatch for transformer.h.33.mlp.fc_out.qweight: copying a param with shape torch.Size([6144, 6144]) from checkpoint, the shape in current model is torch.Size([3072, 6144]).
size mismatch for transformer.h.33.mlp.fc_out.qzeros: copying a param with shape torch.Size([192, 1536]) from checkpoint, the shape in current model is torch.Size([192, 768]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 762) of binary: /opt/conda/envs/moss/bin/python3.8
Traceback (most recent call last):
File "/opt/conda/envs/moss/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/conda/envs/moss/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/opt/conda/envs/moss/lib/python3.8/site-packages/accelerate/commands/launch.py", line 909, in launch_command
multi_gpu_launcher(args)
File "/opt/conda/envs/moss/lib/python3.8/site-packages/accelerate/commands/launch.py", line 604, in multi_gpu_launcher
distrib_run.run(args)
File "/opt/conda/envs/moss/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/opt/conda/envs/moss/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/envs/moss/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
是不是尺寸不对啊,这个是啥情况
deepspeed_config: zero_stage: 3 moss的配置是deepspeed stage3 模型是分开加载到跟个模型,我使用1batch_size,8卡的瞬时基本每张卡都到70G显存以上了,官网没有开源,我改好,目前int8可以finetune
大神,是指int8 的finetune 需要80GB8的显存吗? 不知道48GB8的显存能否finetune?
deepspeed_config: zero_stage: 3 moss的配置是deepspeed stage3 模型是分开加载到跟个模型,我使用1batch_size,8卡的瞬时基本每张卡都到70G显存以上了,官网没有开源,我改好,目前int8可以finetune
大神,是指int8 的finetune 需要80GB_8的显存吗? 不知道48GB_8的显存能否finetune?
精调moss需要,int8不需要
大神们好。我在4张A100上进行finetune,batch=1。但是还是会提示
out of memory
。请问是啥情况啊