Closed ftgreat closed 1 year ago
麻烦 @Achazwl 有空帮看下,谢谢
pipeline: use bmt.init_distributed(pipe_size=2)
, and use bmt.PipelineTransformerBlockList
instead of bmt.TransformerBlockList
. You can put part of the layers in bmt.TransformerBlockList
and other layers in bmt.PipelineTransformerBlockList
.
bf16: You can simply change the dtype of parameters during model construction. Currently, bmtrain should only support bmt.optim.AdamOptimizer
, but not bmt.optim.AdamOffloadOptimizer
. bf16 is still a beta feature, so if there are any bugs during use, please provide feedback
You can put part of the layers in bmt.TransformerBlockList and other layers in bmt.PipelineTransformerBlockList.
还有个问题:@Achazwl 这里是指 zero 和 pipeline 结合吗?zero2 & zero3 都可以和 pipeline 结合使用么?谢谢
Pipeline can only be combined with zero1 and has already been implemented in bmt.PipelineTransformerBlockList
.
For those layers in bmt.TransformerBlockList
, only zero3 is used.
如果有多个模型,其中部分模型只用来做推断,请问怎么配置使用参数切分(类似zero3)? @Achazwl
Just write each model separately
pipeline: use
bmt.init_distributed(pipe_size=2)
, and usebmt.PipelineTransformerBlockList
instead ofbmt.TransformerBlockList
. You can put part of the layers inbmt.TransformerBlockList
and other layers inbmt.PipelineTransformerBlockList
. bf16: You can simply change the dtype of parameters during model construction. Currently, bmtrain should only supportbmt.optim.AdamOptimizer
, but notbmt.optim.AdamOffloadOptimizer
. bf16 is still a beta feature, so if there are any bugs during use, please provide feedback
@Achazwl 看代码中有dtype判断,请问 AdamOptimizer 支持 bf16 么
https://github.com/OpenBMB/BMTrain/blob/main/bmtrain/optim/adam.py#L65
You can use torch.optim.Adam
for bf16.
Bmtrain 提到支持 bf16 和 pipeline parallel。 请问有没有使用例子, pipeline parallel 和 zero 可以同时使用吗,谢谢