issues
search
OpenBMB
/
BMTrain
Efficient Training (including pre-training and fine-tuning) for Big Models
Apache License 2.0
560
stars
77
forks
source link
Tensor Parallel
#153
Closed
zkh2016
closed
1 year ago
zkh2016
commented
1 year ago
PR主要修改点:
添加tensor parallel模式:
https://github.com/OpenBMB/BMTrain/issues/149
修改topology以支持PP,TP,ZERO组合
修改parameter相关代码,适配TP模式
PP去除单独切分参数的逻辑,复用CheckpointBlock的参数切分
save/load适配TP模式
TODO:
优化linear,反向通信可以overlap
PR主要修改点:
TODO: