Federated Learning | Concept 24 FL for MNIST

Description

When the DS launch up a remote training, on DO side, report "TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first."

How to Reproduce

run the code line-by-line, everything works fine, until arriving to PART 3: Training. (I have a GPU and CUDA )
Training will stop at epoch 1 and no progress anymore.
on DO side I can see the error report as above "TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first."

Expected Behavior

This is a classic issue for general ML and I can find solution, but how to handle this by using FL lib (by which the training happen on DO side actually)

System Information

OS: ubuntu18.04
Language Version: Python:3.7.10, torch:1.8.1, torchvision:0.9.1
Package Manager Version: [e.g. conda 4.11.0, pip 21.2.2 ]

OpenMined / courses

Federated Learning | Concept 24 FL for MNIST #403

Description

How to Reproduce

Expected Behavior

System Information