business-science / modeltime.gluonts

GluonTS Deep Learning with Modeltime
https://business-science.github.io/modeltime.gluonts/
Other
39 stars 9 forks source link

mxnet & gluonts minimum versions required for multiple GPU support? #58

Open joranE opened 11 months ago

joranE commented 11 months ago

I'm having difficulty getting multiple GPUs to work via the instructions in the vignette. I'm on CUDA 10.1, with gluonts 0.8.0 & mxnet- cu101 0.7.0, Ubuntu 20.04, which works with a single GPU.

However, when I pass:

set_engine(...,ctx = list(mxnet$gpu(0),mxnet$gpu(1)))

I get the following error:

Error: pydantic.error_wrappers.ValidationError: 1 validation error for TrainerModel
ctx
expected string or bytes-like object (type=type_error)

It seems the gluonts model is expecting the ctx argument to be a string, and indeed it issues no complaints if I pass ctx = "gpu".

I've been trying different combinations of mxnet & gluonts versions with no luck, partly because any attempt with an mxnet-cu101 version higher than 1.7 doesn't work at all (CPU or just 1 GPU) with an error on not being able to find libnccl2.so, and I'm not sure why some versions of mxnet would be able to locate it and others wouldn't.

Is there some CUDA, mxnet & gluonts version combination that multiple GPU support is limited to?

Edit: I'm increasingly confused as to whether multiple GPU support ever worked at all, despite what appears in the vignette. This is basically the only reference I can find in the gluonts source code to adding multi-gpu support and it was never merged, and the ctx argument has apparently never allowed for lists.