Closed ShuhuaGao closed 2 years ago
hi,
thanks for the issue!
could you please try using evaluate_loader
without any training?)
as I see our implementation, we just run the experiment... so the problem could be with transferring model/data from train to validation experiment.
I tried
num_epochs=0
in runner.train
, and the same error occurred.runner.train
, and change evaluate_loader
to
runner.evaluate_loader(loaders['valid'], model=model)
Then there was no error, though the code is not useful.
so, looks like we have some problems with handware-backend 😢 maybe, @ditwoo @bagxi could also review it our :)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
🐛 Bug Report
How To Reproduce
I have two GPUs and enable both of them. I copied the linear regression minimal example. After that, I checked
Then, the following line produced a long error message:
By contrast, if I use one GPU or CPU via setting
os.environ["CUDA_VISIBLE_DEVICES"]
, then it works.Pytorch
### Environment # example checklist, fill with your info Catalyst version: 20.04. PyTorch version: 1.11.0 Python version: 3.9 CUDA runtime version: 11.4 Nvidia driver version: 472.39 ```DataParallel
supports inference on multiple GPUs, right? I don't understand whyevaluate_loader
fails withDataParallelEngine
.