JF-D / Proteus

10 stars 3 forks source link

Enable Pipeline Parallel for Megatron #4

Closed tareqmahmood closed 1 month ago

tareqmahmood commented 1 month ago

According to the following: https://github.com/JF-D/Proteus/blob/0bb4cb1c977e3c0626e18c3ba7b0bfdbc463780c/examples/megatron_gpt.py#L317

megatron strategy expects pp_deg to be 1. Is there any reason behind this?

JF-D commented 1 month ago

The megatron strategy represents using megatron tensor parallelism, together with zero strategy. If you want to use megatron with pipeline parallelism, just use -ps pp and sepcify -mp-deg.