Elhamnazari1372 commented 6 months ago

I'm runnin the default run (python main.py fedavg config/template.yml) . I'm getting the following report :

client [79] (test) loss: 0.3858 -> 0.3872 accuracy: 88.50% -> 88.00% client [28] (test) loss: 0.1150 -> 0.1162 accuracy: 97.62% -> 97.62% client [99] (test) loss: 0.2672 -> 0.2528 accuracy: 94.20% -> 94.72% Training... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:09:36 FedAvg's average time taken by each global epoch: 0 min 5.73 sec. FedAvg's total running time: 0 h 9 m 36 s. ==================== FedAvg Experiment Results: ==================== Format: (before local fine-tuning) -> (after local fine-tuning) So if finetune_epoch = 0, x.xx% -> 0.00% is normal. {100: {'all_clients': {'test': {'loss': '0.3384 -> 0.0000', 'accuracy': '91.31% -> 0.00%'}}}} ========== FedAvg Convergence on train clients ========== test (before local training): 10.0%(13.14%) at epoch: 1 20.0%(24.33%) at epoch: 3 60.0%(63.00%) at epoch: 7 70.0%(74.64%) at epoch: 9 80.0%(82.61%) at epoch: 16 90.0%(91.24%) at epoch: 40 test (after local training): 80.0%(81.93%) at epoch: 0 90.0%(90.20%) at epoch: 1 ==================== FedAvg Max Accuracy ==================== all_clients: (test) before fine-tuning: 91.31% at epoch 100 (test) after fine-tuning: 0.00% at epoch 100

why after fine tunning accuracy is showing 0% ??

thanks for your help.

KarhouTam commented 6 months ago

Format: (before local fine-tuning) -> (after local fine-tuning) So if finetune_epoch = 0, x.xx% -> 0.00% is normal.

☝ finetune_epoch is set to 0 in template.yml https://github.com/KarhouTam/FL-bench/blob/b19d9350dc73496e7b85372061fea4be91505e8d/config/template.yml#L24

KarhouTam commented 5 months ago

This issue is closed due to long time no response.

Elhamnazari1372 commented 5 months ago

I changed as your recommend but got the same results. Seems still not running the finetune.

20240531_181540

KarhouTam commented 5 months ago

Sorry for my late respone. What's your run command? If you set finetune_epoch, you need to specify the config file in the command like python main.py fedavg your_config.yml

Elhamnazari1372 commented 5 months ago

I use the same command as you mentioned . my config is :

Full explaination are listed on README.md

mode: parallel # [serial, parallel]

parallel: # It's fine to keep these configs.

Go check doc of `https://docs.ray.io/en/latest/ray-core/api/doc/ray.init.html` for more details.

ray_cluster_addr: null # [null, auto, local]

`null` implies that all cpus/gpus are included.

num_cpus: null num_gpus: null

should be set larger than 1, or training mode fallback to `serial`

Set a larger `num_workers` can further boost efficiency, also let each worker have less computational resources.

num_workers: 2

common: dataset: mnist seed: 42 model: lenet5 join_ratio: 0.1 global_epoch: 100 local_epoch: 5 finetune_epoch: 20 batch_size: 32 test_interval: 100 straggler_ratio: 0 straggler_min_local_epoch: 0 external_model_params_file: "" optimizer: name: sgd # [sgd, adam, adamw, rmsprop, adagrad] lr: 0.01 dampening: 0 # SGD weight_decay: 0 momentum: 0 # [SGD, RMSprop] alpha: 0.99 # RMSprop nesterov: false # SGD betas: [0.9, 0.999] # [Adam, AdamW] amsgrad: false # [Adam, AdamW]

lr_scheduler: name: step # null for deactivating step_size: 10

eval_test: true eval_val: false eval_train: false

verbose_gap: 10 visible: false use_cuda: true save_log: true save_model: false save_fig: true save_metrics: true check_convergence: true

You can set specific arguments for FL methods also

FL-bench uses FL method arguments by args..

e.g.

fedprox: mu: 0.01 pfedsim: warmup_round: 0.7

...

NOTE: For those unmentioned arguments, the default values are set in `get_<method>_args()` in `src/server/<method>.py`

KarhouTam commented 5 months ago

I tested on my workspace and everything is fine.

Here is the result, config, commands to reproduce it:

Result

==================== FedAvg Experiment Results: ====================                                                                                                                                                      
Format: (before local fine-tuning) -> (after local fine-tuning) So if finetune_epoch = 0, x.xx% -> 0.00% is normal.                                                                                                       
{100: {'all_clients': {'test': {'loss': '0.3364 -> 0.3116', 'accuracy': '91.44% -> 92.18%'}}}}                                                                                                                            
========== FedAvg Convergence on train clients ==========                                                                                                                                                                 
test (before local training):                                                                                                                                                                                             
10.0%(11.65%) at epoch: 0                                                                                                                                                                                                 
20.0%(27.31%) at epoch: 3                                                                                                                                                                                                 
30.0%(35.33%) at epoch: 4                                                                                                                                                                                                 
40.0%(47.46%) at epoch: 5                                                                                                                                                                                                 
60.0%(63.21%) at epoch: 7                                                                                                                                                                                                 
70.0%(75.43%) at epoch: 9                                                                                                                                                                                                 
80.0%(86.50%) at epoch: 18                                                                                                                                                                                                
90.0%(90.34%) at epoch: 37                                                                                                                                                                                                
test (after local training):                                                                                                                                                                                              
80.0%(82.13%) at epoch: 0                                                                                                                                                                                                 
90.0%(91.06%) at epoch: 1                                                                                                                                                                                                 
==================== FedAvg Max Accuracy ====================                                                                                                                                                             
all_clients:                                                                                                                                                                                                              
(test) before fine-tuning: 91.44% at epoch 100                                                                                                                                                                            
(test) after fine-tuning: 92.18% at epoch 100

Config

# cfg.yml
mode: parallel # [serial, parallel]

parallel: # It's fine to keep these configs.
  # Go check doc of `https://docs.ray.io/en/latest/ray-core/api/doc/ray.init.html` for more details.
  ray_cluster_addr: null # [null, auto, local]

  # `null` implies that all cpus/gpus are included.
  num_cpus: null
  num_gpus: null

  # should be set larger than 1, or training mode fallback to `serial`
  # Set a larger `num_workers` can further boost efficiency, also let each worker have less computational resources.
  num_workers: 2

common:
  dataset: mnist
  seed: 42
  model: lenet5
  join_ratio: 0.1
  global_epoch: 100
  local_epoch: 5
  finetune_epoch: 5
  batch_size: 32
  test_interval: 100
  straggler_ratio: 0
  straggler_min_local_epoch: 0
  external_model_params_file: ""
  buffers: local # [local, global, drop]
  optimizer:
    name: sgd # [sgd, adam, adamw, rmsprop, adagrad]
    lr: 0.01
    dampening: 0 # SGD
    weight_decay: 0
    momentum: 0 # [SGD, RMSprop]
    alpha: 0.99 # RMSprop
    nesterov: false # SGD
    betas: [0.9, 0.999] # [Adam, AdamW]
    amsgrad: false # [Adam, AdamW]

  lr_scheduler:
    name: step # null for deactivating
    step_size: 10

  eval_test: true
  eval_val: false
  eval_train: false

  verbose_gap: 10
  visible: false
  use_cuda: true
  save_log: true
  save_model: false
  save_fig: true
  save_metrics: true
  check_convergence: true

# You can set specific arguments for FL methods also
# FL-bench uses FL method arguments by args.<method>.<arg>
# e.g.
fedprox:
  mu: 0.01
pfedsim:
  warmup_round: 0.7
# ...

# NOTE: For those unmentioned arguments, the default values are set in `get_<method>_args()` in `src/server/<method>.py`

Commands

python generate_data.py -d mnist -a 0.1 -cn 100 python main.py fedavg cfg.yml

Elhamnazari1372 commented 5 months ago

thanks for your response . could I ask you what config I can use for resnet18 and cifar10 to get the best accuracy?

KarhouTam commented 5 months ago

There are tons of variables that can affect the final accuracy. Sorry I can't tell you the optimal config.

Elhamnazari1372 commented 5 months ago

is there a config that you used and got a reasonable response? thanks

KarhouTam commented 5 months ago

Just try it yourself.

KarhouTam / FL-bench

why after fine tunning accuracy shows 0% ?? #71

Full explaination are listed on README.md

Go check doc of `https://docs.ray.io/en/latest/ray-core/api/doc/ray.init.html` for more details.

`null` implies that all cpus/gpus are included.

should be set larger than 1, or training mode fallback to `serial`

Set a larger `num_workers` can further boost efficiency, also let each worker have less computational resources.

You can set specific arguments for FL methods also

FL-bench uses FL method arguments by args..

e.g.

...

NOTE: For those unmentioned arguments, the default values are set in `get_<method>_args()` in `src/server/<method>.py`

Result

Config

Commands

KarhouTam / FL-bench

why after fine tunning accuracy shows 0% ?? #71

Full explaination are listed on README.md

Go check doc of https://docs.ray.io/en/latest/ray-core/api/doc/ray.init.html for more details.

null implies that all cpus/gpus are included.

should be set larger than 1, or training mode fallback to serial

Set a larger num_workers can further boost efficiency, also let each worker have less computational resources.

You can set specific arguments for FL methods also

FL-bench uses FL method arguments by args..

e.g.

...

NOTE: For those unmentioned arguments, the default values are set in get_<method>_args() in src/server/<method>.py

Result

Config

Commands

Go check doc of `https://docs.ray.io/en/latest/ray-core/api/doc/ray.init.html` for more details.

`null` implies that all cpus/gpus are included.

should be set larger than 1, or training mode fallback to `serial`

Set a larger `num_workers` can further boost efficiency, also let each worker have less computational resources.

NOTE: For those unmentioned arguments, the default values are set in `get_<method>_args()` in `src/server/<method>.py`