HewlettPackard / swarm-learning

A simplified library for decentralized, privacy preserving machine learning
Apache License 2.0
328 stars 100 forks source link

pyt.py updates to reset model back to train mode after loss computation #212

Closed RadhakrishnaJ closed 7 months ago

RadhakrishnaJ commented 7 months ago

whl file updates for pyt images

pyt.tf client module setting model mode to test for computing loss. It needs to be set back to train mode, otherwise user code might get into issues for continuous training process where user doesn't set model back to train mode. Observed the issue while testing differential privacy in pytorch. During differential privacy optimizer expects model to be in train mode, but due to this issue in client code it was throwing an issues.

ISSUE: File "model/cifar_first_stage.py", line 129, in doTrainBatchPvc loss.backward() File "/opt/conda/lib/python3.7/site-packages/torch/_tensor.py", line 489, in backward self, gradient, retain_graph, create_graph, inputs=inputs File "/opt/conda/lib/python3.7/site-packages/torch/autograd/init.py", line 199, in backward allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 62, in call return self.hook(module, *args, **kwargs) File "/opt/conda/lib/python3.7/site-packages/opacus/grad_sample/grad_sample_module.py", line 330, in capture_backprops_hook batch_first=batch_first, File "/opt/conda/lib/python3.7/site-packages/opacus/grad_sample/grad_sample_module.py", line 388, in rearrange_grad_samples activations = module.activations.pop() IndexError: pop from empty list