Closed xiaozhu1106 closed 1 year ago
想在自己的领域做二次预训练,A100单机多卡如何实现呢? 目前提供的script/train.sh脚本跑不通
请问大佬跑通了嘛,是如何一步步解决出现的问题呢?我现在卡在了deepspeed.initialize上面,这里会报错: KeyError: 0 if partition_id in self.param_to_partition_ids[group_id][param_id]
Required prerequisites
Questions
想在自己的领域做二次预训练,A100单机多卡如何实现呢? 目前提供的script/train.sh脚本跑不通
Checklist