Closed iiacobac closed 11 months ago
你好,这个目前我们还不支持,后续可能会排入需求计划里
你好,我是Ignacio。
The fix has been applied since my comment.
You can compare https://github.com/Ascend/pytorch/blob/v2.0.3/pytorch1.8.1/src/torch/lib/c10d/ProcessGroupHCCL.cpp and
where only these data types are supported
{at::kChar, HCCL_DATA_TYPE_INT8},
{at::kFloat, HCCL_DATA_TYPE_FP32},
{at::kInt, HCCL_DATA_TYPE_INT32},
{at::kHalf, HCCL_DATA_TYPE_FP16},
{at::kShort, HCCL_DATA_TYPE_INT16},
{at::kLong, HCCL_DATA_TYPE_INT64},
with https://github.com/Ascend/pytorch/blob/master/torch_npu/csrc/distributed/ProcessGroupHCCL.cpp where kbool among others, are included
{at::kByte, HCCL_DATA_TYPE_UINT8},
{at::kChar, HCCL_DATA_TYPE_INT8},
{at::kShort, HCCL_DATA_TYPE_INT16},
{at::kInt, HCCL_DATA_TYPE_INT32},
{at::kLong, HCCL_DATA_TYPE_INT64},
{at::kHalf, HCCL_DATA_TYPE_FP16},
{at::kFloat, HCCL_DATA_TYPE_FP32},
{at::kDouble, HCCL_DATA_TYPE_FP64},
{at::kBool, HCCL_DATA_TYPE_UINT8},
{at::kBFloat16, HCCL_DATA_TYPE_BFP16},
BF16 is not supported in the 1.8.1 version, but is supported in the current master version.
Please apply the same fix as done for NCCL for the kbool limitation
follow this update
https://github.com/pytorch/pytorch/commit/366c014a7799f0b7bbc258fd6c271dadb99d1de0#diff-43cb0f438d3eb35dec0a1680ddc2d01c3ae9277d91aca4c2119d0b9ea80adeb6