aidecentralized / sonar

SONAR - Self-Organizing Network of Aggregated Representations
MIT License
13 stars 34 forks source link

FedAvg broken with new grpc comm_utils #107

Closed rishi-s8 closed 1 month ago

rishi-s8 commented 1 month ago

When trying to run traditional_fl.py, the framework throws an error:

INFO:root:Starting clients federated averaging
Starting clients federated averaging
INFO:root:Starting round 0
Starting round 0
ERROR:grpc._server:Exception calling application: 'FedAvgClient' object has no attribute 'round'
Traceback (most recent call last):
  File "/home/risharma/miniconda3/envs/sonar/lib/python3.10/site-packages/grpc/_server.py", line 609, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/mnt/nfs/risharma/MIT/sonar/src/utils/communication/grpc/main.py", line 130, in get_current_round
    round = comm_pb2.Round(round=self.base_node.get_local_rounds())
  File "/mnt/nfs/risharma/MIT/sonar/src/algos/base_class.py", line 249, in get_local_rounds
    return self.round

Also, I think the synchronous logic for traditional FL will be broken. Can't the following scenario happen?

  1. Clients start training. Client's local round = 0.
  2. Server requests a model from everyone, since clients are still training (or just slow), the initial models are sent to the server. Server aggregates the initial models. Increases the server's local round to 1.
  3. Clients finish training, update the local round to 1, ask for server's latest model, overriding their local training. And this process can continue.

Potential solution: Since these algorithms are event-based/causal in nature, I am not able to think of a way to design this with only receive_model requests from each side without somehow having an additional interaction between the algorithm and comm_utils keeping track of what pulls have succeeded.

The more elegant solution would be to allow sends and receives both. While it adds an additional layer of implementation complexity for the user, it makes implementation and reasoning of current Decentralized/Federated algorithms easier.