PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.13k stars 5.55k forks source link

Parameter sharing #11209

Closed emailweixu closed 6 years ago

emailweixu commented 6 years ago

The current way of sharing parameter between two parts of a model is to use the same full name of the parameter. This can become very cumbersome for sharing large models. There are two ways of achieving parameter sharing:

1). object oriented approach. This is used by PyTorch (https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network). Currently, our reinforcement learning framework also choose to use approach (https://github.com/PaddlePaddle/PARL/blob/develop/parl/layers/tests/test_param_sharing.py) because it does not need to use any name, which is error prone.

2). variable_scope. This is used by tensorflow (https://www.tensorflow.org/api_docs/python/tf/variable_scope).

1) and 2) will result in very different way of writing models. Given our current state, perhaps we should implement a mechanism similar to variable scope.

panyx0718 commented 6 years ago

Looking at the test, it seems we need to set the same name in param_attr to share parameter. Simply setting name doesn't work because the NameGenerator will generate unique suffix.

Another way is to use a unique_name.guard and new generator. In some way, we can make different layers have the same parameter name. However, I heard there is a side-effect that make unrelated temp variable share the same name. (Perhaps add a option parameter_only would work)

image

AFAIK, Keras is also using the 1) and Tensorflow is considering deprecating 2) in tf2.0 (with a lot of difficulties given lots of existing users).

emailweixu commented 6 years ago

I also feel 1) is better, that's why our RL framework choose to wrap around fluid for doing it. Perhaps we can support 1) while keeping compatibility with the current way. And we can gradually changing existing code to using 1).

Anyway, parameter sharing is an important part of deep learning model. We should have a good solution. Otherwise the framework is a defected.

Superjomn commented 6 years ago

Yes, the object way is easier to understand for both python beginner and expert.

reyoung commented 6 years ago

What about #6887 ?

shanyi15 commented 6 years ago

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持! Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!