FedML-AI / FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
https://TensorOpera.ai
Apache License 2.0
4.2k stars 787 forks source link

Client Index is awlays 0 in the outputs #359

Closed mh-lan closed 2 years ago

mh-lan commented 2 years ago

After running fedml.run_simulation() with the following args args = {'yaml_config_file': '', 'run_id': '0', 'rank': 0, 'local_rank': 0, 'yaml_paths': ['C:\\Users\\doubl\\anaconda3\\lib\\site-packages\\fedml\\config/simulation_sp/fedml_config.yaml'], 'training_type': 'simulation', 'random_seed': 0, 'dataset': 'mnist', 'data_cache_dir': '../../../data/mnist', 'partition_method': 'hetero', 'partition_alpha': 0.5, 'model': 'lr', 'federated_optimizer': 'FedAvg', 'client_id_list': '[]', 'client_num_in_total': 1000, 'client_num_per_round': 10, 'comm_round': 200, 'epochs': 1, 'batch_size': 10, 'client_optimizer': 'sgd', 'learning_rate': 0.03, 'weight_decay': 0.001, 'frequency_of_the_test': 5, 'using_gpu': False, 'gpu_id': 0, 'backend': 'sp', 'log_file_dir': './log', 'enable_wandb': False, 'wandb_key': 'ee0b5f53d949c84cee7decbe7a629e63fb2f8408', 'wandb_entity': 'fedml-ai', 'wandb_project': 'simulation', 'run_name': 'fedml_torch_fedavg_mnist_lr'}, all the clients indexes are set to 0 during each communication round

################Communication round : 0
client_indexes = [993 859 298 553 672 971  27 231 306 706]
client_indexes = [993 859 298 553 672 971  27 231 306 706]
Update Epoch: 0 [10/20 (50%)]   Loss: 2.369121
Update Epoch: 0 [20/20 (100%)]  Loss: 2.222175
Client Index = 0    Epoch: 0    Loss: 2.295648
Update Epoch: 0 [10/80 (12%)]   Loss: 2.237719
Update Epoch: 0 [20/80 (25%)]   Loss: 2.212439
Update Epoch: 0 [30/80 (38%)]   Loss: 2.156245
Update Epoch: 0 [40/80 (50%)]   Loss: 2.132013
Update Epoch: 0 [50/80 (62%)]   Loss: 1.966263
Update Epoch: 0 [60/80 (75%)]   Loss: 2.014680
Update Epoch: 0 [70/80 (88%)]   Loss: 1.959164
Update Epoch: 0 [80/80 (100%)]  Loss: 2.071570
Client Index = 0    Epoch: 0    Loss: 2.093762
Update Epoch: 0 [10/10 (100%)]  Loss: 2.196219
Client Index = 0    Epoch: 0    Loss: 2.196219
Update Epoch: 0 [10/40 (25%)]   Loss: 2.186752
Update Epoch: 0 [20/40 (50%)]   Loss: 2.137678
Update Epoch: 0 [30/40 (75%)]   Loss: 2.035772
Update Epoch: 0 [40/40 (100%)]  Loss: 2.068833
Client Index = 0    Epoch: 0    Loss: 2.107259
Update Epoch: 0 [10/30 (33%)]   Loss: 2.131074
Update Epoch: 0 [20/30 (67%)]   Loss: 2.123323
Update Epoch: 0 [30/30 (100%)]  Loss: 2.080403
Client Index = 0    Epoch: 0    Loss: 2.111600
Update Epoch: 0 [10/20 (50%)]   Loss: 2.237621
Update Epoch: 0 [20/20 (100%)]  Loss: 2.087353
Client Index = 0    Epoch: 0    Loss: 2.162487
Update Epoch: 0 [10/220 (5%)]   Loss: 2.289883
Update Epoch: 0 [20/220 (9%)]   Loss: 2.208545
Update Epoch: 0 [30/220 (14%)]  Loss: 2.049766
Update Epoch: 0 [40/220 (18%)]  Loss: 2.067477
Update Epoch: 0 [50/220 (23%)]  Loss: 2.129515
Update Epoch: 0 [60/220 (27%)]  Loss: 2.089700
Update Epoch: 0 [70/220 (32%)]  Loss: 1.997110
Update Epoch: 0 [80/220 (36%)]  Loss: 1.985615
Update Epoch: 0 [90/220 (41%)]  Loss: 2.026330
Update Epoch: 0 [100/220 (45%)] Loss: 1.942877
Update Epoch: 0 [110/220 (50%)] Loss: 1.951413
Update Epoch: 0 [120/220 (55%)] Loss: 1.951839
Update Epoch: 0 [130/220 (59%)] Loss: 2.005044
Update Epoch: 0 [140/220 (64%)] Loss: 1.915352
Update Epoch: 0 [150/220 (68%)] Loss: 1.898417
Update Epoch: 0 [160/220 (73%)] Loss: 1.872110
Update Epoch: 0 [170/220 (77%)] Loss: 1.942758
Update Epoch: 0 [180/220 (82%)] Loss: 1.993931
Update Epoch: 0 [190/220 (86%)] Loss: 1.771066
Update Epoch: 0 [200/220 (91%)] Loss: 1.955911
Update Epoch: 0 [210/220 (95%)] Loss: 1.898305
Update Epoch: 0 [220/220 (100%)]    Loss: 1.877099
Client Index = 0    Epoch: 0    Loss: 1.991821
Update Epoch: 0 [10/30 (33%)]   Loss: 2.108855
Update Epoch: 0 [20/30 (67%)]   Loss: 2.077567
Update Epoch: 0 [30/30 (100%)]  Loss: 1.969054
Client Index = 0    Epoch: 0    Loss: 2.051826
Update Epoch: 0 [10/60 (17%)]   Loss: 2.313509
Update Epoch: 0 [20/60 (33%)]   Loss: 2.164576
Update Epoch: 0 [30/60 (50%)]   Loss: 2.042380
Update Epoch: 0 [40/60 (67%)]   Loss: 2.042106
Update Epoch: 0 [50/60 (83%)]   Loss: 2.052663
Update Epoch: 0 [60/60 (100%)]  Loss: 1.919023
Client Index = 0    Epoch: 0    Loss: 2.089043
Update Epoch: 0 [10/120 (8%)]   Loss: 2.017837
Update Epoch: 0 [20/120 (17%)]  Loss: 2.062440
Update Epoch: 0 [30/120 (25%)]  Loss: 1.939170
Update Epoch: 0 [40/120 (33%)]  Loss: 2.007145
Update Epoch: 0 [50/120 (42%)]  Loss: 2.001268
Update Epoch: 0 [60/120 (50%)]  Loss: 1.912833
Update Epoch: 0 [70/120 (58%)]  Loss: 1.924959
Update Epoch: 0 [80/120 (67%)]  Loss: 1.911851
Update Epoch: 0 [90/120 (75%)]  Loss: 1.884201
Update Epoch: 0 [100/120 (83%)] Loss: 1.891351
Update Epoch: 0 [110/120 (92%)] Loss: 1.906423
Update Epoch: 0 [120/120 (100%)]    Loss: 1.784800
Client Index = 0    Epoch: 0    Loss: 1.937023
################local_test_on_all_clients : 0
{'training_acc': 0.5150006486766995, 'training_loss': 2.113317905002591}
{'test_acc': 0.5147198480531814, 'test_loss': 2.1109883090628485}

while the Client Index ought to be the element in the index set [993 859 298 553 672 971 27 231 306 706]?

chaoyanghe commented 2 years ago

@mh-lan Thanks for your feedback. This is just a logging issue and doesn't have impact on the accuracy. I've just fixed at dev branch. Will be released to master later this week.

https://github.com/FedML-AI/FedML/commit/5439f4ca31c7cbb3b1cba1721a770aa63f1488fc

mh-lan commented 2 years ago

Many thanks.