SymbioticLab / FedScale

FedScale is a scalable and extensible open-source federated learning (FL) platform.
https://fedscale.ai
Apache License 2.0
388 stars 119 forks source link

Fix Async #158

Closed fanlai0990 closed 2 years ago

fanlai0990 commented 2 years ago

Why are these changes needed?

In async FedScale example, (i) training stalls after a while; (ii) API mismatch in Test;

Related issue number

Closes #148

Checks

mosharaf commented 2 years ago

Thanks @fanlai0990.

Quick questions: Does it address the event mis-ordering issue? (Fine, if not, but we should have a separate issue then)

fanlai0990 commented 2 years ago

Not yet. I need more time to think about it (regarding overhead and its fidelity). Probably push a new fix in early Sept.

The current async example makes some sense, but it relies on the clairvoyant information of client completion time, and breaks our client arrival traces.

ewenw commented 2 years ago

Does the current async simulation perform somewhat valid results despite that it's not entirely correct?

fanlai0990 commented 2 years ago

To the best of my knowledge, it provides valid and correct results if no other weird bugs out of blue.

The only deficiency is: client arrivals within the buffer_size do not use the system trace, constant arrivals instead, but note that cross-buffer arrivals still follow the system trace. So it can still provide more realistic evaluations than other existing ones. Please feel free to test, and let us know if you find bugs.

We plan to implement a much more sophisticated version in the future, which should fix this deficiency and the requirement of clairvoyant client completion time. But understandably, this requires reordering of events on the fly, and many other pieces. Please stay tuned.

ewenw commented 2 years ago

Hi @fanlai0990, I'm still not seeing any model test results using the latest async code. Were you able to see test outputs when you run it?

Here are the params I used:

    --data_set femnist
    --data_dir=$(CODE_FETCHER_DEST)/li-cross-device-fl/FedScale/benchmark/dataset/data/femnist
    --data_map_file $(CODE_FETCHER_DEST)/li-cross-device-fl/FedScale/benchmark/dataset/data/femnist/client_data_mapping/train.csv
    --log_path some/path
    --rounds 300
    --eval_interval 2
    --num_participants 800
    --async_buffer 20
    --arrival_interval 3
fanlai0990 commented 2 years ago

Hi @ewenw, Thanks for trying it out! I pulled the latest code, and tested it using your configuration (more details attached below). I can see the test results. Other than the tensorboard, you can try cat femnist_logging | grep "test_loss". My output is: (08-10) 22:45:01 INFO [executor.py:374] After aggregation round 2, CumulTime 90.7529, eval_time 29.2102, test_loss 3.9207, test_accuracy 4.98%, test_5_accuracy 23.55%

However, I indeed notice some weird test accuracy and am working on it. In the meantime, please let us know if you have any other concerns or features you want.


    - job_name: femnist                   # Generate logs under this folder: log_path/job_name/time_stamp
    - log_path: $FEDSCALE_HOME/benchmark # Path of log files
    - num_participants: 800                      # Number of participants per round, we use K=100 in our paper, large K will be much slower
    - data_set: femnist                     # Dataset: openImg, google_speech, stackoverflow
    - data_dir: $FEDSCALE_HOME/benchmark/dataset/data/femnist    # Path of the dataset
    - data_map_file: $FEDSCALE_HOME/benchmark/dataset/data/femnist/client_data_mapping/train.csv              # Allocation of data to each client, turn to iid setting if not provided
    - device_conf_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_device_capacity     # Path of the client trace
    - device_avail_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_behave_trace
    - model: shufflenet_v2_x2_0                            # Models: e.g., shufflenet_v2_x2_0, mobilenet_v2, resnet34, albert-base-v2
    - eval_interval: 2                     # How many rounds to run a testing on the testing set
    - rounds: 300                          # Number of rounds to run this training. We use 1000 in our paper, while it may converge w/ ~400 rounds
    - filter_less: 21                       # Remove clients w/ less than 21 samples
    - num_loaders: 2
    - local_steps: 20
    - learning_rate: 0.05
    - batch_size: 20
    - test_bsz: 20
    - use_cuda: False
    - decay_round: 50
    - overcommitment: 1.0
    - async_buffer: 20
    - arrival_interval: 3```
ewenw commented 2 years ago

Thank you, @fanlai0990 for the prompt response! I can see the test results now after changing the number of executors to 1. With an increasing number of executors, I see less and less test data points. I do also observe some weirdness in accuracy and loss. image

fanlai0990 commented 2 years ago

Thanks for confirming it! I am fixing it and will get back to you soon.