FederatedAI / FATE

An Industrial Grade Federated Learning Framework
Apache License 2.0
5.73k stars 1.55k forks source link

Bugs in keras-based homo_nn: Does not aggregate(send) all trainable params(`_trainable_weights`) ? #3816

Closed gxcuit closed 2 years ago

gxcuit commented 2 years ago

Describe the bug In the keras-backend nn model, it does not aggregate(send) all the trainable params. This will make Homo-Federated-Learning less effective. The pytorch-backend nn does not have this issue.

For instance, in the given example, the number of trainable variables is 31(30 kernel and 1 bias). image

However, only 1 variable(dense/bias) is aggregated. image

To Reproduce Steps to reproduce the behavior:

  1. FATE standalone 1.7.0

  2. Print the aggregated weights. Adding LOGGER code in the following files: https://github.com/FederatedAI/FATE/blob/018d051f06298cd01aec957d569ff5760ff0070e/python/federatedml/nn/homo_nn/_version_0.py#L184-L185 image

  3. Run the keras example and pytorch example respectively. Please note, the pytorch example has a typo: the initiator's party id is wrong. Please see #3814 and #3815

  4. See the corresponding log. The keras-backend only aggregates one variable. The pytorch-backend aggregates all variables. image image

Expected behavior The Keras-backend homo-nn should behave the same as pytorch backend.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context I think this bug may be cased by the following code: https://github.com/FederatedAI/FATE/blob/018d051f06298cd01aec957d569ff5760ff0070e/python/federatedml/nn/backend/tf_keras/nn_model.py#L124-L126

I don't know why needs to trim the device name?

sagewe commented 2 years ago

Thanks, we have fix this issue lone time ago in this commit, and problemly reintroduced in this commit

Would you like to make a pull-request to fix this again? @gxcuit

gxcuit commented 2 years ago

Thanks, we have fix this issue lone time ago in this commit, and problemly reintroduced in this commit

Would you like to make a pull-request to fix this again? @gxcuit

I have re-submitted the PR and removed the comments.

Please kindly refer to #3821