TsingZ0 / PFLlib

37 traditional FL (tFL) or personalized FL (pFL) algorithms, 3 scenarios, and 20 datasets.
GNU General Public License v2.0
1.35k stars 283 forks source link

使用DP出现错误 #150

Closed FryLcm closed 9 months ago

FryLcm commented 9 months ago

我直接下载了该项目,并在fedavg算法上开启差分隐私,但出现错误:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm) 请问您使用的opacus的版本是哪个呢?我用的是opacus==1.4版本。

Algorithm: FedAvg Local batch size: 10 Local steps: 1 Local learing rate: 0.005 Local learing rate decay: False Total number of clients: 20 Clients join in each round: 1.0 Clients randomly join: False Client drop rate: 0.0 Client select regarding time: False Running times: 1 Dataset: mnist Number of classes: 10 Backbone: cnn Using device: cuda Using DP: True Sigma for DP: 1.3 Auto break: False Global rounds: 1000 Cuda device id: 0 DLG attack: False Total number of new clients: 0 Fine tuning epoches on new clients: 0

-------------Round number: 6-------------

Evaluate global model Averaged Train Loss: 2.2341 Averaged Test Accurancy: 0.2832 Averaged Test AUC: 0.6044 Std Test Accurancy: 0.2771 Std Test AUC: 0.2570 Client 2 epsilon = 0.34, sigma = 1e-05 Client 7 epsilon = 0.40, sigma = 1e-05 Client 9 epsilon = 0.20, sigma = 1e-05 Client 13 epsilon = 0.29, sigma = 1e-05 Client 15 epsilon = 0.17, sigma = 1e-05 Client 6 epsilon = 0.40, sigma = 1e-05 Client 5 epsilon = 0.15, sigma = 1e-05 Client 19 epsilon = 0.16, sigma = 1e-05 Client 11 epsilon = 0.21, sigma = 1e-05 Client 0 epsilon = 0.25, sigma = 1e-05 Client 12 epsilon = 0.21, sigma = 1e-05 Client 17 epsilon = 0.16, sigma = 1e-05 Client 1 epsilon = 0.75, sigma = 1e-05 Client 8 epsilon = 0.21, sigma = 1e-05 Client 16 epsilon = 0.21, sigma = 1e-05 Client 10 epsilon = 0.23, sigma = 1e-05 Client 4 epsilon = 0.31, sigma = 1e-05 Client 14 epsilon = 0.16, sigma = 1e-05 Client 3 epsilon = 0.26, sigma = 1e-05 Client 18 epsilon = 0.17, sigma = 1e-05 ------------------------- time cost ------------------------- 147.04931139945984

-------------Round number: 7-------------

Evaluate global model Averaged Train Loss: 2.2249 Averaged Test Accurancy: 0.2486 Averaged Test AUC: 0.6456 Std Test Accurancy: 0.2692 Std Test AUC: 0.2444 Client 16 epsilon = 0.21, sigma = 1e-05 Client 17 epsilon = 0.16, sigma = 1e-05 Traceback (most recent call last): File "C:\Users\Acer\Desktop\PFL-Non-IID-20231122\system\main.py", line 509, in run(args) File "C:\Users\Acer\Desktop\PFL-Non-IID-20231122\system\main.py", line 327, in run server.train() File "C:\Users\Acer\Desktop\PFL-Non-IID-20231122\system\flcore\servers\serveravg.py", line 34, in train client.train() File "C:\Users\Acer\Desktop\PFL-Non-IID-20231122\system\flcore\clients\clientavg.py", line 44, in train self.optimizer.step() File "C:\Users\Acer\anaconda3\envs\fedcp\lib\site-packages\opacus\optimizers\optimizer.py", line 513, in step if self.pre_step(): File "C:\Users\Acer\anaconda3\envs\fedcp\lib\site-packages\opacus\optimizers\optimizer.py", line 494, in pre_step self.clip_and_accumulate() File "C:\Users\Acer\anaconda3\envs\fedcp\lib\site-packages\opacus\optimizers\optimizer.py", line 412, in clip_and_accumulate grad = contract("i,i...", per_sample_clip_factor, grad_sample) File "C:\Users\Acer\anaconda3\envs\fedcp\lib\site-packages\opt_einsum\contract.py", line 507, in contract return _core_contract(operands, contraction_list, backend=backend, *einsum_kwargs) File "C:\Users\Acer\anaconda3\envs\fedcp\lib\site-packages\opt_einsum\contract.py", line 573, in _core_contract new_view = _tensordot(tmp_operands, axes=(tuple(left_pos), tuple(right_pos)), backend=backend) File "C:\Users\Acer\anaconda3\envs\fedcp\lib\site-packages\opt_einsum\sharing.py", line 131, in cached_tensordot return tensordot(x, y, axes, backend=backend) File "C:\Users\Acer\anaconda3\envs\fedcp\lib\site-packages\opt_einsum\contract.py", line 374, in _tensordot return fn(x, y, axes=axes) File "C:\Users\Acer\anaconda3\envs\fedcp\lib\site-packages\opt_einsum\backends\torch.py", line 54, in tensordot return torch.tensordot(x, y, dims=axes) File "C:\Users\Acer\anaconda3\envs\fedcp\lib\site-packages\torch\functional.py", line 1100, in tensordot return _VF.tensordot(a, b, dims_a, dims_b) # type: ignore[attr-defined] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

Process finished with exit code 1

FryLcm commented 9 months ago

已解决,在return _VF.tensordot(a, b, dims_a, dims_b) # type: ignore[attr-defined]这一行之前加上
a = a.cuda() b = b.cuda()

TsingZ0 commented 9 months ago

opacus这个包每次更新跟之前都不怎么兼容,可能需要降级使用。不过还是推荐用最新的,因为最新版在计算效率上比较高