trainer: _param_filter, filters trainable parameters by mistake.

alibaba / FederatedScope

An easy-to-use federated learning platform

https://www.federatedscope.io

Apache License 2.0

1.26k stars 206 forks source link

trainer: _param_filter, filters trainable parameters by mistake. #733

Closed XueBaolu closed 7 months ago

XueBaolu commented 8 months ago

when _self.cfg.personalization.local_param = [], the function __paramfilter will filter all of the trainable parameters. Perhaps it can be fixed by adding "if len(filter_keywords) == 0: return state_dict" in around 392 lines.

yxdyc commented 8 months ago

Thank you for your interest in FS!. When filter_keywords=self.cfg.personalization.local_param = [], in line 396 of trainer.py, the function filter_by_specified_keywords actually will skip its for loop so that preserve is always True. Thus the problem should not be caused by personalization.local_param = [].

Maybe your reported issue has been fixed in this PR. What are your specific test scripts and environment (especially the version of FS)?

XueBaolu commented 8 months ago

Thank you for responding. I download the codes recently, so the version of FS used is new. But when I run FL with resnet18 and CIFAR10, the accuracy of global model is 0.1 after the 100 rounds. And I found that the function _param_filter will filter all of the trainable parameters, because the code in 'filter(lambda elem ...' is always true. After adding "if len(filter_keywords) == 0: return state_dict" in 392 lines, the accuracy of the system becomes 0.5, a normal value.

rayrayraykk commented 8 months ago

Thank you for reaching out with your issue. We have attempted to reproduce the problem based on your description, but everything appears to be functioning as expected on our end: python federatedscope/main.py --cfg scripts/example_configs/femnist.yaml

Additionally, here are the relevant git logs:

It's possible that the issue might be related to the model's trainability or an inadvertent switch to eval mode. Could you please double-check your model configuration and ensure it is set to training mode? If the issue persists, feel free to share more details or any error messages you encounter, and we'll be happy to take another look.

Thank you for using our project, and we are committed to helping you resolve this matter.

XueBaolu commented 8 months ago

Thank you, maybe something wrong, resulting in discrepancies in our experimental results. But it's pleased that this error has been rectified.