issues
search
InternLM
/
InternEvo
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
https://internevo.readthedocs.io/zh-cn/latest/?badge=latest
Apache License 2.0
310
stars
52
forks
source link
Feat/refactor process group
#358
Open
mwiacx
opened
3 weeks ago
mwiacx
commented
3 weeks ago
重构ProccessGroup的构建,之前的代码有点典型的为了面向对象而面向对象。
优化项:
[x] 复用共通的rank分配逻辑
[x] 易于理解的并行维度组合和拆分定义
[x] 易于改动并行组合的优先分配顺序
[x] 支持嵌套多层并行维度组合定义
[x] 支持并行组合中若干维度,比如中间维度匿名,只占位,不创建ProcessGroup
[x] 更统一的支持创建不完整的Group
单元测试:
[x] mtp: world_size = 16, zero1 = -1
[x] mtp: world_size = 16, tp = 4, zero1.5 = 2
[x] mtp: world_size = 16, tp = 2, pp = 2, zero1.5 = -1
[x] mtp moe: world_size = 16, tp = 2, pp = 2, ep = 4, ep_no_tp = false, zero1 = -1
[x] mtp moe: world_size = 16, tp = 2, pp = 1, ep = 2, ep_no_tp = true, zero1 = 2
[x] msp/fsp: world_size = 16, zero1 = -1
[x] msp/fsp: world_size = 16, tp = 4, zero1.5 = 2
[x] msp/fsp: world_size = 16, tp = 2, pp = 2, zero1.5 = -1
[x] msp/fsp moe: world_size = 16, tp = 2, pp = 2, ep = 4, ep_no_tp = false, zero1 = -1
[x] msp/fsp moe: world_size = 16, tp = 2, pp = 1, ep = 2, ep_no_tp = true, zero1 = 2
[x] isp: world_size = 16, zero = -1
[x] isp: world_size = 16, sp = 4, pp = 2, zero = -1
[x] isp: world_size = 16, wp = 4, pp = 1, zero1.5 = 2
[x] isp: world_size = 16, sp = 2, wp = 2, pp = 2, zero = -1
[x] isp moe: world_size = 16, sp = 2, wp =2, ewp = 4, ep = 2, pp = 2, zero = -1
[x] isp moe: world_size = 16, sp = 2, wp =2, ewp = 2, ep = 4, pp = 2, zero = -1
[x] isp 2d attn: world_size = 16, sp = 4, wp = 4, pp = 2, zero = -1, hp = 2, cp =2, window_size=1, head_first = True, interleaved = False
[x] isp 2d attn: world_size = 16, sp = 8, wp = 2, pp = 1, zero = -1, hp = 4, cp = 2, window_size=2, head_first = False, interleaved = False
[x] isp 2d attn: world_size = 16, sp = 8, wp = 2, pp = 1, zero = -1, hp = 1, cp = 8, window_size=4, head_first = False, interleaved = True
[x] isp 2d attn: world_size = 16, sp = 8, wp = 2, pp = 1, zero = -1, hp = 2, cp = 4, window_size=2, head_first = False, interleaved = True
[x] isp 2d attn moe : world_size = 16, sp = 4, wp = 4, pp = 2, ewp = 4, ep = 2, zero = -1, hp = 2, cp =2, window_size=2, head_first = True, interleaved = False
[x] isp 2d attn: world_size = 16, sp = 8, wp = 2, pp = 1, ewp = 2, ep = 4, zero = -1, hp = 2, cp = 4, window_size=2, head_first = False, interleaved = True
重构ProccessGroup的构建,之前的代码有点典型的为了面向对象而面向对象。
优化项:
单元测试: