Closed emailweixu closed 6 years ago
Looking at the test, it seems we need to set the same name in param_attr
to share parameter. Simply setting name
doesn't work because the NameGenerator will generate unique suffix.
Another way is to use a unique_name.guard
and new generator. In some way, we can make different layers have the same parameter name. However, I heard there is a side-effect that make unrelated temp variable share the same name. (Perhaps add a option parameter_only
would work)
AFAIK, Keras is also using the 1) and Tensorflow is considering deprecating 2) in tf2.0 (with a lot of difficulties given lots of existing users).
I also feel 1) is better, that's why our RL framework choose to wrap around fluid for doing it. Perhaps we can support 1) while keeping compatibility with the current way. And we can gradually changing existing code to using 1).
Anyway, parameter sharing is an important part of deep learning model. We should have a good solution. Otherwise the framework is a defected.
Yes, the object way is easier to understand for both python beginner and expert.
What about #6887 ?
您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持! Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!
The current way of sharing parameter between two parts of a model is to use the same full name of the parameter. This can become very cumbersome for sharing large models. There are two ways of achieving parameter sharing:
1). object oriented approach. This is used by PyTorch (https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network). Currently, our reinforcement learning framework also choose to use approach (https://github.com/PaddlePaddle/PARL/blob/develop/parl/layers/tests/test_param_sharing.py) because it does not need to use any name, which is error prone.
2). variable_scope. This is used by tensorflow (https://www.tensorflow.org/api_docs/python/tf/variable_scope).
1) and 2) will result in very different way of writing models. Given our current state, perhaps we should implement a mechanism similar to variable scope.