Closed hongliny closed 3 years ago
I find out the solution: the tff.learning.build_federated_evaluation(tff_model_fn)
should also be replaced with tff.learning.build_federated_evaluation(tff_model_fn, use_experimental_simulation_loop=True)
.
Closing this issue.
Re-opening for posterity while issues with #33 are being investigated.
We seem to have a good understanding about what is happening. In short, in multi-GPU environment, use experimental_simulation_loop=True
for tff.learning
functions, and for...iter(dataset)
for your customized training loops. Note that dropout and layers have internal randomness may sometimes give unexpected results and should be used with care. Closing this issue for now.
Hi there,
I am trying to launch a multi-gpu experiments based on research/optimization, but keeps getting errors involving
datasets.reduce
as belowI tried to replace this line and this line with
for batch in iter(dataset)
, but the issue persists. I couldn't find any other potential usage ofdataset.reduce
.Here is the prompt I used to reproduce this issue
Any help will be greatly appreciated.