CharlieDinh / pFedMe

Personalized Federated Learning with Moreau Envelopes (pFedMe) using Pytorch (NeurIPS 2020)
290 stars 88 forks source link

pfedme Optimizer Probelem #12

Closed adam4096 closed 3 years ago

adam4096 commented 3 years ago

Hello, I want to ask:
Why is different, code and _algorithm? That is in fedoptimizer.py line 64: p.data = p.data - group['lr'] (p.grad.data + group['lamda'] (p.data - localweight.data) + group['mu']*p.data) and Algorithm 1 line 8 捕获

CharlieDinh commented 3 years ago

Hi.

The line 64 in optimizer is to solve h_i (Equation 7 in our paper) to find the persionalized model.

Algorithm 1 line 8 is in userpFedMe.py line 53-54 to find local model.

for new_param, localweight in zip(self.persionalized_model_bar, self.local_model): localweight.data = localweight.data - self.lamda self.learning_rate (localweight.data - new_param.data)

adam4096 commented 3 years ago

Thank you!

adam4096 commented 3 years ago

Hi.

The line 64 in optimizer is to solve h_i (Equation 7 in our paper) to find the persionalized model.

Algorithm 1 line 8 is in userpFedMe.py line 53-54 to find local model.

for new_param, localweight in zip(self.persionalized_model_bar, self.local_model): localweight.data = localweight.data - self.lamda self.learning_rate (localweight.data - new_param.data)

Hello, For the line 64 in optimizer is to solve h_i, do you modify the NAG to find the personalized model? It doesn't look like a standard NAG update formula.

CharlieDinh commented 3 years ago

No. I just applied gradient descent for h_i

adam4096 commented 3 years ago

No. I just applied gradient descent for h_i

But this update looks strange. Can you explain it? Thamk you very much! p.data = p.data - group['lr'] (p.grad.data + group['lamda'] (p.data - localweight.data) + group['mu']*p.data)

CharlieDinh commented 3 years ago

(p.grad.data ) is gradient update for of f_bar (group['lamda'] (p.data - localweight.data) is gradient for moreau envelopes regularization term group['mu']p.data is just norm 2 regularizartion, but mu = 0 mean we don;t use it so (p.grad.data + group['lamda'] * (p.data - localweight.data)) is gradient of h_i