Oneflow-Inc / libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
https://libai.readthedocs.io
Apache License 2.0
391 stars 55 forks source link

add_dist_device_type #451

Closed CPFLAME closed 1 year ago

CPFLAME commented 1 year ago

这个PR需要解决的:

设置方式:

from libai.utils import distributed as dist

dist.set_device_type("cpu")
...
dist.set_device_type("cuda")

该pr解决了超大模型权重在单卡无法加载,可以通过设置device让权重在cpu上加载后使用半精度转到单卡上运行