Oneflow-Inc / libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
https://libai.readthedocs.io
Apache License 2.0
389 stars 55 forks source link

refine dist tensor to rank0 #446

Closed CPFLAME closed 1 year ago

CPFLAME commented 1 year ago

这个pr要做的:

model_test除了projects下的MT5外, 都已经跑通, 本身MT5的eager global pipeline并行需要在model.forward下面加上to_global语句. 不在此PR下修复