Why are these changes needed?

There is a bug when executor pulls the model from the aggregator. In original implementation, the model adapter will execute the optimizer step at the executor, which should theoretically be executed only at the aggregator end, leading to poor performance for fed-yogi. What I did:

Turn off optimizer step at the executor
Edit gradient_policy (optimizer) naming in several config files, from yogi to fed-yogi. If use yogi, the real optimizer would be fed-avg as if statement for fed-yogi in optimizer is not entered.
Change initial hyperparameter weight of fed-yogi according to the fed-yogi paper. Original setup might cause the model to drift a little bit before starting to converge
```
        self.v_t = [torch.full_like(g, self.tau) for g in gradients]
        self.m_t = [torch.full_like(g, 0.0) for g in gradients]
```

Related issue number

243

Checks

[x] I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/
[x] I've made sure the following tests are passing.
Testing Configurations
- [x] Dry Run (20 training rounds & 1 evaluation round)
- [x] Cifar 10 (20 training rounds & 1 evaluation round)
- [x] Femnist (20 training rounds & 1 evaluation round)

FEMNIST fed-yogi optimizer run result

Screenshot from 2023-12-17 08-17-44

AmberLJC commented 11 months ago

May I know the results for cat femnist_logging |grep "FL Testing" while specify - gradient_policy: fed-yogi?

EricDinging commented 11 months ago

@AmberLJC Here is the result [

](url)

    - yogi_eta: 0.01
    - yogi_tau: 0.001
    - yogi_beta: 0.01
    - yogi_beta2: 0.99

AmberLJC commented 11 months ago

Thank you!

SymbioticLab / FedScale

Fix fed-yogi executor model download #245

Why are these changes needed?

Related issue number

243

Checks

FEMNIST fed-yogi optimizer run result