Cifar10 dataset setting get error with flexible number of client

codaibk commented 3 years ago

Hi, I am running Federated Learning with differential privacy folder. It seems the cifar10 is only ok with 10 clients (number of client = number of class). When changing the number of client different with number of class ( example: number of client =20 while number of class in cifar10 = 10). The system get error. I think it is necessary to edit the code with flexible number of client. Do you have any suggestion? I changed the number of client using bellow code:

And It get this error

Thanks.

codaibk commented 3 years ago

Hi, Is there any update or suggestion to fix this issue? because your cifar10 setting is only ok with 10 clients but not for flexible number of client. @ZacharyGarrett

xjiajiahao commented 3 years ago

Hi, Is there any update or suggestion to fix this issue? because your cifar10 setting is only ok with 10 clients but not for flexible number of client. @ZacharyGarrett

Hi, I think there is a typo in the utils/datasets/cifar10_dataset.py file. The code on line 101 seems wrong. https://github.com/google-research/federated/blob/42ec49634d9d27d0ac5d16820271d6d2cc5b55b9/utils/datasets/cifar10_dataset.py#L101 The correct one should be

     for k in range(NUM_CLIENTS):

ZacharyGarrett commented 3 years ago

Thanks for investigating @xjiajiahao! Would you be willing to submit a pull request to make the change?

codaibk commented 3 years ago

@xjiajiahao @ZacharyGarrett Change only that line will not fix problem because the code determine train_client_samples based on "train_example_indices" index: train_client_samples[k].append( train_example_indices[sampled_label, train_count[sampled_label]]) and train_example_indices size is set based on number of examples each class (5000 for train, 1000 for test) When you change NUM_CLIENTS=> NUM_EXAMPLES_PER_CLIENT and TEST_SAMPLES_PER_CLIENT will be changed too. This one will make the error.

`for k in range(NUM_CLIENTS):

for i in range(NUM_EXAMPLES_PER_CLIENT):
  sampled_label = np.argwhere(
      np.random.multinomial(1, train_multinomial_vals[k, :]) == 1)[0][0]
  train_client_samples[k].append(
      train_example_indices[sampled_label, train_count[sampled_label]])
  train_count[sampled_label] += 1
  if train_count[sampled_label] == NUM_EXAMPLES_PER_CLIENT:
    train_multinomial_vals[:, sampled_label] = 0
    train_multinomial_vals = (
        train_multinomial_vals /
        train_multinomial_vals.sum(axis=1)[:, None])`

With NUM_CLIENTS < 10. The error is :`IndexError: index 5000 is out of bounds for axis 1 with size 5000
With NUM_CLIENTS > 10. The error is: np.random.multinomial(1, train_multinomial_vals[k, :]) == 1)[0][0] File "mtrand.pyx", line 4212, in numpy.random.mtrand.RandomState.multinomial File "_common.pyx", line 338, in numpy.random._common.check_array_constraint ValueError: pvals < 0, pvals > 1 or pvals contains NaNs

zcharles8 commented 3 years ago

@hsidahmed865 has kindly offered to take a look and potentially submit a fix, as they have been bumping up against this. Thanks @hsidahmed865!

zcharles8 commented 3 years ago

Hi @codaibk. This issue should have been fixed by commits 74fdc1680c33169714f577cdc3398c94d0326aff and 83b23c36a5fd29c4c89631a125d54947074699b4. Can you confirm whether or not this fixed your problem?

codaibk commented 3 years ago

Sorry, but it does not fix the problem. The problem is that this code can't deal with flexible number of clients like I mentioned above. And you guys commits here don't change anything about algorithm but just only change the parameters. @zcharles8

zcharles8 commented 3 years ago

Hi @codaibk. Can you verify that your version of the repository includes the commit I listed above? They have added the functionality to allow the user to specify num_clients.

If so, can you run the following test using bazel: https://github.com/google-research/federated/blob/master/utils/datasets/cifar10_dataset_test.py

This test is passing for me, and explicitly tests num_clients = 8, num_clients = 10, and num_clients = 100.

codaibk commented 3 years ago

@zcharles8 . it seems you guys changed the run file run_federated.py in differential privacy folder too. The old file will call cifar10_dataset.py for generating data. could you tell me what is the command for program running with cifar10_dataset_test.py? Thanks.

zcharles8 commented 3 years ago

We recommend using Bazel (see https://bazel.build/). Once you have that configured, you can simply run bazel test {path to test}:{test_name} in order to run a test.

If you'd prefer to not use Bazel, you could run cifar10_dataset.load_cifar10_federated with different numbers of num_clients arguments, and make sure that you get a dataset with the requisite number of clients.

zcharles8 commented 3 years ago

Hi @codaibk. I am marking this as resolved for now, as it is working according to all of our tests. If you are still seeing errors, please post your full stack trace, as well as the commands that resulted in the error.

google-research / federated

Cifar10 dataset setting get error with flexible number of client #23