KarhouTam / FL-bench

Benchmark of federated learning. Dedicated to the community. 🤗
GNU General Public License v3.0
505 stars 82 forks source link

python generate_data.py -d medmnistC -a 0.1 -cn 100 #49

Closed elegy112138 closed 11 months ago

elegy112138 commented 12 months ago

When I use the command python generate_data.py -d medmnistC -a 0.1 -cn 100, it takes a long time to execute and seems to fail because I previously used python generate_data.py -d medmnistC -a 0.5 -cn 100. Do you know how to resolve this issue?

KarhouTam commented 12 months ago

Hi, @elegy112138. If you wanna kepp the -a 0.1 -cn 100 settings, you can decrease the --least_samples (set as 40 by default) and run again. Dirichlet partition scheme can run for unpredictably long time due to small --alpha with big -cn and --least_samples.

elegy112138 commented 12 months ago

I used to run the command python generate_data.py -d medmnistC -a 0.1 -cn 100 quickly before, but now it has suddenly become very slow. I don't want to change the value of --least_samples. Is it possible to make this command run successfully by waiting for a long time?

KarhouTam commented 12 months ago

I used to run the command python generate_data.py -d medmnistC -a 0.1 -cn 100 quickly before

Could you offer some information of that previous code, like commit details?

elegy112138 commented 12 months ago

I apologize, but I may not be able to submit. However, I can run this command quickly on other datasets, except for medmnistC now

KarhouTam commented 12 months ago

No need to apologize. It's okay. 😂 Since I barely change the Dirichlet partitioning scheme, and I just tried that partition settings on my workspace and the program stuck also, so I am curious about that previous code. BTW, You can try change random seed also.

elegy112138 commented 12 months ago

I'm currently running comparative experiments, and the previous experiments were conducted with the following dataset_args: 'dataset_args': { 'dataset': 'medmnistC', 'client_num': 100, 'fraction': 0.5, 'seed': 42, 'split': 'sample', 'alpha': 0.1, 'least_samples': 40 } I don't want to change these parameters to maintain consistency as a control variable. Are you also experiencing a slowdown in running the data partitioning command now?

KarhouTam commented 12 months ago

Running with your settings only cost 3s

KarhouTam commented 11 months ago

This issue is closed due to long time no response.