Oneflow-Inc / libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
https://libai.readthedocs.io
Apache License 2.0
391 stars 55 forks source link

using tensor device in eval distributed #557

Closed fpzh2011 closed 1 month ago

fpzh2011 commented 1 month ago

libai/evaluation/evaluator.py 中调用 tensor_to_rank0 时,没有传递 device 参数,这样只能是默认的 cuda,不支持 npu/xpu 等设备。 对于 global tensor,可以用 placement.type 作为 device 参数值。