HorovodRunner not recognizing multiple GPUs on Databricks

Environment:

Framework: PyTorch Framework version: 1.5.0 Horovod version: 0.19.1 MPI version: mpirun (Open MPI) 3.0.0 CUDA version: 10.1 NCCL version: 2.7.3 Python version: 3.7.6 OS and version: Ubuntu 18.04.4 LTS GCC version: 7.5.0

Question:

I am running HorovodRunner on Databricks Runtime 7.0 ML with 3 Standard_NC24 GPU worker instances and it seems like not all GPUs that are available are being utilized. There are 4 GPUs on each worker, so 12 GPUs in total.

I have been running tests using the following code:

import horovod.torch as hvd
from sparkdl import HorovodRunner

def test_fn():
    hvd.init()
    print(hvd.local_rank())

hr = HorovodRunner(np=8)
hr.run(test_fn)

At one point, the output of this code was:

[1,3]<stdout>:1
[1,0]<stdout>:0
[1,1]<stdout>:0
[1,5]<stdout>:2
[1,7]<stdout>:3
[1,4]<stdout>:2
[1,2]<stdout>:1
[1,6]<stdout>:3

I then restarted the cluster and the output was:

[1,6]<stdout>:2
[1,0]<stdout>:0
[1,3]<stdout>:1
[1,7]<stdout>:2
[1,4]<stdout>:1
[1,1]<stdout>:0
[1,2]<stdout>:0
[1,5]<stdout>:1

How come HorovodRunner isn't picking up all of the GPUs available and is doubling/tripling up processes on a few GPUs? Am I doing something wrong here? Is this an issue with Horovod and not HorovodRunner?

Any help is greatly appreciated!

databricks / spark-deep-learning

HorovodRunner not recognizing multiple GPUs on Databricks #230