OpenMined / courses

A place where our community can discuss OpenMined Courses, including posting questions, sharing feedback, or providing comments for discussion!
http://courses.openmined.org
167 stars 74 forks source link

Federated Learning | Concept 24 FL for MNIST #403

Open LeonMac opened 2 years ago

LeonMac commented 2 years ago

Description

When the DS launch up a remote training, on DO side, report "TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first."

How to Reproduce

  1. run the code line-by-line, everything works fine, until arriving to PART 3: Training. (I have a GPU and CUDA )
  2. Training will stop at epoch 1 and no progress anymore.
  3. on DO side I can see the error report as above "TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first."

Expected Behavior

This is a classic issue for general ML and I can find solution, but how to handle this by using FL lib (by which the training happen on DO side actually)

System Information