baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.
https://huggingface.co/baichuan-inc/baichuan-7B
Apache License 2.0
5.67k stars 506 forks source link

[Question] 如何在单机多卡上,继续预训练? #94

Closed xiaozhu1106 closed 1 year ago

xiaozhu1106 commented 1 year ago

Required prerequisites

Questions

想在自己的领域做二次预训练,A100单机多卡如何实现呢? 目前提供的script/train.sh脚本跑不通

Checklist

hingkan commented 1 year ago

请问大佬跑通了嘛,是如何一步步解决出现的问题呢?我现在卡在了deepspeed.initialize上面,这里会报错: KeyError: 0 if partition_id in self.param_to_partition_ids[group_id][param_id]