Bert Model Parallel - Githubissues

sywang0111 commented 3 years ago

请问Oneflow支持Bert的模型并行吗，如果支持的话怎么做呢？如果不支持的话，有计划什么时间让更多的OP支持model_distributed吗?

yuanms2 commented 3 years ago

oneflow 支持模型并行是比较简单直接的，在使用模型并行之前，需要研究一下是不是有必要使用模型并行。

根据经验来说，bert这样的模型使用数据并行的加速比已经很高了，可参考 https://github.com/Oneflow-Inc/DLPerf

如果是训练GPT-3这样规模的模型，那的确是既需要模型并行，也需要模型并行，甚至流水并行。 Nvidia 基于PyTorch实现的Megatron-LM，以及Microsoft 基于PyTorch开发的Deepspeed 有这样的功能。基于OneFlow 也比较容易支持类似的功能，一些示例代码正在工作过程中，我们尽快完善好。

OneFlow-Megatron-LM测评 https://github.com/Oneflow-Inc/DLPerf/pull/110/files

deepspeed https://github.com/Oneflow-Inc/DLPerf/pull/109/files

sywang0111 commented 3 years ago

我们只是想定量的看一看bert的模型并行性能，并不打算实际用模型并行去训练bert。当前oneflow是只支持了Dense和PRuLu的model_distributed吗？其他OP您认为有必要支持model_distributed吗？最后感谢您的回答。受教了。

yuanms2 commented 3 years ago

oneflow 对很多常见的op 都是可以实现模型并行的，在oneflow里面模型并行实际上是对variable 做SBP 中的S(plit) 这样的标注就可以了。

https://github.com/Oneflow-Inc/OneFlow-Benchmark/pull/155/files 这里有一个oneflow实现gpt-2的代码，这个例子实现了对attention, dense 等处的模型并行。、

在deep and wide 模型里面，embedding层也需要支持模型并行。 conv 层的variable也可以支持模型并行，不过没有实用意义。

sywang0111 commented 3 years ago

恩，学习了。这个例子非常的清楚。再次感谢。

Oneflow-Inc / OneFlow-Benchmark

Bert Model Parallel #168