OpenBMB / BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models
Apache License 2.0
560 stars 77 forks source link

[问题]bf16 & pipeline parallel #94

Closed ftgreat closed 1 year ago

ftgreat commented 1 year ago

Bmtrain 提到支持 bf16 和 pipeline parallel。 请问有没有使用例子, pipeline parallel 和 zero 可以同时使用吗,谢谢

ftgreat commented 1 year ago

麻烦 @Achazwl 有空帮看下,谢谢

Achazwl commented 1 year ago

pipeline: use bmt.init_distributed(pipe_size=2), and use bmt.PipelineTransformerBlockList instead of bmt.TransformerBlockList. You can put part of the layers in bmt.TransformerBlockList and other layers in bmt.PipelineTransformerBlockList. bf16: You can simply change the dtype of parameters during model construction. Currently, bmtrain should only support bmt.optim.AdamOptimizer, but not bmt.optim.AdamOffloadOptimizer. bf16 is still a beta feature, so if there are any bugs during use, please provide feedback

ftgreat commented 1 year ago

You can put part of the layers in bmt.TransformerBlockList and other layers in bmt.PipelineTransformerBlockList.

还有个问题:@Achazwl 这里是指 zero 和 pipeline 结合吗?zero2 & zero3 都可以和 pipeline 结合使用么?谢谢

Achazwl commented 1 year ago

Pipeline can only be combined with zero1 and has already been implemented in bmt.PipelineTransformerBlockList. For those layers in bmt.TransformerBlockList, only zero3 is used.

ftgreat commented 1 year ago

如果有多个模型,其中部分模型只用来做推断,请问怎么配置使用参数切分(类似zero3)? @Achazwl

Achazwl commented 1 year ago

Just write each model separately

ftgreat commented 1 year ago

pipeline: use bmt.init_distributed(pipe_size=2), and use bmt.PipelineTransformerBlockList instead of bmt.TransformerBlockList. You can put part of the layers in bmt.TransformerBlockList and other layers in bmt.PipelineTransformerBlockList. bf16: You can simply change the dtype of parameters during model construction. Currently, bmtrain should only support bmt.optim.AdamOptimizer, but not bmt.optim.AdamOffloadOptimizer. bf16 is still a beta feature, so if there are any bugs during use, please provide feedback

@Achazwl 看代码中有dtype判断,请问 AdamOptimizer 支持 bf16 么

https://github.com/OpenBMB/BMTrain/blob/main/bmtrain/optim/adam.py#L65

Achazwl commented 1 year ago

You can use torch.optim.Adam for bf16.