Koukyosyumei / RPDPKDFL

Code for Reconstruct Private Data via Public Knowledge in Distillation-based Federated Learning
0 stars 0 forks source link

Reviews #1

Open Koukyosyumei opened 2 years ago

Koukyosyumei commented 2 years ago

strength

I like the intuition of using confidence gaps (obtained through logits only) to approximate the original private model, but there shall be more details about the inversion model G. My understanding is: That G takes both the public data and the predictions of the global model and local model on public data. By minimizing equation 10, G can approximate the original private model and could be further used for data reconstruction through model inversion. Could the authors provide more explanations here?

-> His understanding is right, and we have to have more explanations.

weaknesses

The related work section is not well organized. The authors shall carefully place their work in the literature.

For example, why `federated learning with knowledge distillation' is an important federated learning setting to consider?

-> We may have to add the example of real-setting

The authors may briefly discuss other solutions to prevent information leakage of federated learning.

-> Differential privacy (lose accuracy) or hommorpic encryption or mpc (high computational complexity)

Moreover, the connection between model inversion attack variants (2.2.1 - 2.2.3) shall be discussed.

-> We have to survey it.

Some citations are missing or misused. For example, DP-FL, FedMD, and FedGEMS were not cited in the experiment (Sec 4.1.2).

-> We have to add these citations

FedKD first appeared in the Introduction without a citation.

-> We use FedKD as the abbreviation of Federated Learning with Knowledge Distillation. Is it not clear?

Koukyosyumei commented 2 years ago

While these contributions are original and new, there are still relatively incremental compare to previous work. Also, could you state more explicitly whether the different terms in equation 5 and 11 are an original contribution, or whether other papers also consider this kind of terms (in particular the use of in equation 5 and the use of SSIM and TV in equation 11). For example, TV is often used as a regularizer in the loss of reconstruction attacks, so the author should state explicitly if its use in such a constitute a contribution of their own, or just a re-use of a well-known techniques.

While these contributions are original and new, there are still relatively incremental compare to previous work. Also, could you state more explicitly whether the different terms in equation 5 and 11 are an original contribution, or whether other papers also consider this kind of terms (in particular the use of in equation 5 and the use of SSIM and TV in equation 11). For example, TV is often used as a regularizer in the loss of reconstruction attacks, so the author should state explicitly if its use in such a constitute a contribution of their own, or just a re-use of a well-known techniques.

It would be better to state explicitly that FEDMD, FEdGEMS, DSFL pseudo-codes are provided in the appendix. The reader might not know what are these algorithms.

-> We can easily fix this.

What are the training tasks considered by the clients, and which loss are there using?

-> We can easily fix this.

The data that is in the public dataset is not clear. In the case of LAG, the central server has young picture of all the labels and the young and adulte picture of part of the labels? Does the central server exploit in any way the young pictures of the labels for when the adult counterpart is not in the public dataset? How? Same question in the FLW case. The figure S-1 in appendix seems to imply that is it the case: the central server has the picture of all the celebrities masked.

What is in the Figure 4?

Does the server use in anyway the masked/young picture of the labels corresponding to the private datasets?

Out of curiosity, I would like to know whether have considered presenting their work following another narrative, by first claiming that using an auxiliary datasets and PTBI can improve TBI attacks, and then showing that such a framework naturally applies in FedKD?

The paper do not consider possible defenses and mitigations against such an attack. Although I don't think that numerical examples on such defenses is required, it would be nice to list the mitigations that might be interesting to study in further works; Also, a suggestion would be maybe to stress more the restrictive setting of their numerical examples, and to justify more whether such a setting might be present in real-world application.

Koukyosyumei commented 2 years ago

The authors claim that they focus on privacy in Distillation-based Federated Learning. They only conduct experiments on logits-based distillation with public dataset, but ignores the comparison with distillation methods that do not require public datasets (see paper[1,2]). The authors should compare with these methods.

-> These methods communicate gradient, and we assume that the malicious server cannot access the gradient, which is the well-known cause of privacy leakage.

Considering privacy policies and the difficulty of acquiring training data, using public datasets to support FL training is extremely unfeasible. Besides, logits-based distillation with public dataset still performs poorly in FL compared with methods like FedAvg, FedProx, which makes this approach ineffective.

Experimental results are not very convincing. The authors should show results on more datasets. Also, if the domain gap between the public dataset and the private dataset increases, what will be the outcome of your method? It would be better if the authors included an experiment to discuss the impact of the domain gap.

-> We can use H-divergence. https://arxiv.org/ftp/arxiv/papers/1909/1909.11972.pdf

In the experiments, the authors set the number of clients to 1 or 10, and set the number of communication round to 5. I am not sure if this is reasonable. In traditional federated learning, there are a large number of client and communication rounds. Could the author provide an explanation as to why this is set this way?

-> We have to add the real-world settings

-> The related works

There is only one baseline method TBI. It's not fair. I suggest that the author add more methods for comparison. For example, DeepInversion[3] seems to be a very interesting method.

-> DeepInverion needs access to the gradients, which is not accessible in our setting.

[1] Wu C, Wu F, Liu R, et al. Fedkd: Communication efficient federated learning via knowledge distillation[J]. arXiv preprint arXiv:2108.13323, 2021. [2] Lin T, Kong L, Stich S U, et al. Ensemble distillation for robust model fusion in federated learning[J]. Advances in Neural Information Processing Systems, 2020, 33: 2351-2363. [3] Yin H, Molchanov P, Alvarez J M, et al. Dreaming to distill: Data-free knowledge transfer via deepinversion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 8715-8724.