This would enable to use MDI to "parallelize" inference on a single host, or at least to use multiple GPUs at the inference stage.
Not sure about the actual improvement in performance (torch's "distributed" should be faster in any case), but it wouldn't be too difficult to implement.
This would enable to use MDI to "parallelize" inference on a single host, or at least to use multiple GPUs at the inference stage. Not sure about the actual improvement in performance (torch's "distributed" should be faster in any case), but it wouldn't be too difficult to implement.