Closed kimchitsigai closed 3 years ago
Environment:
Framework: TensorFlow Framework version: 2.3.0 Horovod version: 0.21.3 MPI version: 4.0.2 CUDA version: 10.1.2 NCCL version: 2.7.8-1 Python version: 3.7.6 Spark / PySpark version: Ray version: OS and version: RHEL 8.1 GCC version: 7.3.0 CMake version: 3.18.0
Bug report:
I'm calling HorovodBasics.init(comm=[[0,1],[2,3]]) as it seemed to me that the code at https://github.com/DifferentiableUniverseInitiative/horovod/blob/multiple_communicators/horovod/common/basics.py#L68 was designed for that. And I get an exception at MPI._addressof() at https://github.com/DifferentiableUniverseInitiative/horovod/blob/multiple_communicators/horovod/common/basics.py#L76
Same exception with comm=[[0,1]]
Thanks a lot, Kimchi
Environment:
Framework: TensorFlow Framework version: 2.3.0 Horovod version: 0.21.3 MPI version: 4.0.2 CUDA version: 10.1.2 NCCL version: 2.7.8-1 Python version: 3.7.6 Spark / PySpark version: Ray version: OS and version: RHEL 8.1 GCC version: 7.3.0 CMake version: 3.18.0
Bug report:
I'm calling HorovodBasics.init(comm=[[0,1],[2,3]]) as it seemed to me that the code at https://github.com/DifferentiableUniverseInitiative/horovod/blob/multiple_communicators/horovod/common/basics.py#L68 was designed for that. And I get an exception at MPI._addressof() at https://github.com/DifferentiableUniverseInitiative/horovod/blob/multiple_communicators/horovod/common/basics.py#L76
Same exception with comm=[[0,1]]
Thanks a lot, Kimchi