Question about setting batch size in HAM10000 dataset - Code Crashes

Naiftt / SPAFD

Offical Implementation of the paper Suppressing Poisoning Attacks on Federated Learning for Medical Imaging accepted in MICCAI 2022

5 stars 0 forks source link

Question about setting batch size in HAM10000 dataset - Code Crashes #2

Closed priyankupadhya17 closed 1 year ago

priyankupadhya17 commented 1 year ago

In the paper "Suppressing Poisoning Attacks on Federated Learning for Medical Imaging" it is mentioned that the batch size for HAM10000 dataset is 890 for each client.

However,

In the above image we see that different clients have different amount of images for HAM10k dataset (Non iid) and some clients have very less images.

So my question is how can we set the Batch size to 890 when the first client has nearly 500 images (and the code crashes) So were the settings for the clients different when the code was run with Batch size 890 ?

Naiftt commented 1 year ago

Hello, thank you for your interest in our work. Yes the batch size is 890 but it depends on the number of samples that client has. So if it’s less than 890, it will take that number that is less than 890 as the batch size. Which part of the code you have it crashing? Thank you

priyankupadhya17 commented 1 year ago

Thank you for the reply.

This is the part of code where I get the error:

File "main.py", line 252, in learn inputs, labels = adata[0][0][random_indices].to(self.device), adata[0][1][random_indices].to(self.device) IndexError: index 772 is out of bounds for dimension 0 with size 436

So my assumption is since only 436 images are present with that client, any index greater than 436 (since batch size is 890) crashes the code. However if like I were to choose random_indices=min(batch_size, number_of_images_client) then I think it would work correctly.

Regards, Priyank

Naiftt commented 1 year ago

I just cloned the repository and ran it for 50 rounds and it seems to be working correctly. Also, PyTorch should automatically take the total number of samples as the batch size if the samples are fewer than the chosen batch size.

priyankupadhya17 commented 1 year ago

Thanks for checking it again so swiftly. I am using the following commands (I hope that the command is right):

python main.py --method COPOD --numOfAgents 10 --numOfClasses 7 --data noniid_skincancer --modelName ConvSkin --numOfAttacked 0 --local_steps 5 --numOfRounds 250 --seed 2 --lr 0.01 --B 890

python main.py --method COPOD --numOfAgents 10 --numOfClasses 7 --data noniid_skincancer --modelName ConvSkin --numOfAttacked 4 --local_steps 5 --numOfRounds 250 --seed 2 --lr 0.01 --B 890 --Attack True --AttackInfo "{0:'random_weight',1:'random_weight', 2:'scaled_weight100', 3:'opposite_weight0.5'}"

And for every command this crashes exactly where I mentioned.

Naiftt commented 1 year ago

Thank you for your effort! I solved the issue you can pull the code again, the correct code was not pushed for when the batch size is higher than the number of data samples the client has. Also, if you can please star the repo it will be much appreciated. Thank you! Let me know if I should close the issue

priyankupadhya17 commented 1 year ago

Yes it works. You can close the issue. Thank you :)