From what we can tell from playing with it, passing devices such as cuda:2 to run_train.py doesn't seem to work - it appears to still use device 0 (Note that I had to patch the CLI argument parser to allow strings like cuda:N, which I'd be happy to share). I'd have expected to see torch.cuda.set_device(N) someplace, e.g. in https://github.com/ACEsuit/mace/blob/6df88277a2971a819b1d6177e9acbd7dc76b7c54/mace/tools/torch_tools.py#L51
Instead it looks like the device string including the :N is passed to various torch calls throughout the code.
Has anyone actually tested this functionality?
Note that setting CUDA_VISIBLE_DEVICES before running run_train is sufficient for us, so maybe it's not important and this issue can be closed, but having code that does the wrong thing seems bad.
From what we can tell from playing with it, passing devices such as
cuda:2
torun_train.py
doesn't seem to work - it appears to still use device 0 (Note that I had to patch the CLI argument parser to allow strings likecuda:N
, which I'd be happy to share). I'd have expected to seetorch.cuda.set_device(N)
someplace, e.g. in https://github.com/ACEsuit/mace/blob/6df88277a2971a819b1d6177e9acbd7dc76b7c54/mace/tools/torch_tools.py#L51 Instead it looks like the device string including the:N
is passed to various torch calls throughout the code.Has anyone actually tested this functionality?
Note that setting
CUDA_VISIBLE_DEVICES
before runningrun_train
is sufficient for us, so maybe it's not important and this issue can be closed, but having code that does the wrong thing seems bad.