Oneflow-Inc / libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
https://libai.readthedocs.io
Apache License 2.0
391 stars 55 forks source link

Dev update #472

Closed xiezipeng-ML closed 1 year ago

xiezipeng-ML commented 1 year ago
CPFLAME commented 1 year ago

可以把 https://github.com/Oneflow-Inc/libai/blob/main/docs/source/tutorials/basics/Features.md 这个下面关于 zero不支持tensor parallel的NOTE给删了

xiezipeng-ML commented 1 year ago

依赖https://github.com/Oneflow-Inc/oneflow/pull/9975

CPFLAME commented 1 year ago

再加一段修改吧.
把libai和projects下面所有的configs的recompute_grad改成activation_checkpoint

还有类似的text_classification/configs/config.py下面的eval_period=500改成 evaluation=dict(eval_period=500)

xiezipeng-ML commented 1 year ago

eval_period

好的