Open matthieuhumeau opened 3 weeks ago
Hey @matthieuhumeau, thanks for the detailed report. I also believe we need to reset the strategy in the current check, since it's being kept as ddp_spawn
.
In the meantime you can remove that manually after training, e.g.
model.fit(train)
del model.models[0].trainer_kwargs['strategy']
p = model.predict(trian)
What happened + What you expected to happen
The
predict
method fails with the following error when the model has been trained on multi-gpu withddp_spawn
strategy:TypeError: vstack(): argument 'tensors' (position 1) must be tuple of Tensors, not NoneType
This seems to be an issue with the PyTorch Lightning Trainer returning
None
when callingpredict
with multi-gpu. Looks like there is already an existing fix for this: https://github.com/Nixtla/neuralforecast/pull/391/files But the issue persists on my side. I was able to resolve it by modifyingcommon/_base_windows.py
to drop thestrategy
argument from mytrainer_kwargs
.I'm using:
Versions / Dependencies
Running this on Sagemaker (AL2,
5.10.215-203.850.amzn2.x86_64
) Python 3.10torch==2.1.0
pytorch-lightning==2.2.5
neuralforecast==1.7.2
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.