Closed gxcuit closed 2 years ago
Thanks, we have fix this issue lone time ago in this commit, and problemly reintroduced in this commit
Would you like to make a pull-request to fix this again? @gxcuit
Thanks, we have fix this issue lone time ago in this commit, and problemly reintroduced in this commit
Would you like to make a pull-request to fix this again? @gxcuit
I have re-submitted the PR and removed the comments.
Please kindly refer to #3821
Describe the bug In the keras-backend nn model, it does not aggregate(send) all the trainable params. This will make Homo-Federated-Learning less effective. The pytorch-backend nn does not have this issue.
For instance, in the given example, the number of trainable variables is 31(30 kernel and 1 bias).
However, only 1 variable(dense/bias) is aggregated.
To Reproduce Steps to reproduce the behavior:
FATE standalone 1.7.0
Print the aggregated weights. Adding LOGGER code in the following files: https://github.com/FederatedAI/FATE/blob/018d051f06298cd01aec957d569ff5760ff0070e/python/federatedml/nn/homo_nn/_version_0.py#L184-L185
Run the keras example and pytorch example respectively. Please note, the pytorch example has a typo: the initiator's party id is wrong. Please see #3814 and #3815
See the corresponding log. The keras-backend only aggregates one variable. The pytorch-backend aggregates all variables.
Expected behavior The Keras-backend homo-nn should behave the same as pytorch backend.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context I think this bug may be cased by the following code: https://github.com/FederatedAI/FATE/blob/018d051f06298cd01aec957d569ff5760ff0070e/python/federatedml/nn/backend/tf_keras/nn_model.py#L124-L126
I don't know why needs to trim the device name?