PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.23k stars 5.58k forks source link

dynamic_lstm 参数与lstm门对应关系 #10768

Closed seanxh closed 6 years ago

seanxh commented 6 years ago

shape = [2]
gate_size = 1
data = fluid.layers.data(name='data', shape=shape, dtype='float32')
input_forward_proj = fluid.layers.fc(name='fc_0',input=data,
                                             size=gate_size * 4,
                                             act=None,
                                             bias_attr=False)
forward, state = fluid.layers.dynamic_lstm(
    name='blstm_0',
    input=input_forward_proj, size=gate_size * 4,use_peepholes=False,gate_activation='sigmoid')

place = fluid.CPUPlace()
feeder = fluid.DataFeeder(place=place,feed_list=[data])

exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
results = exe.run(fluid.default_main_program(),
                          feed={'data':lod_tensor},
                          fetch_list=['blstm_0.w_0'],return_numpy=False)
print np.array(results[0]).shape

对于lstm,我们现在我使用use_peepholes=False,用普通的lstm。公式应该是

image

这样fc_0层参数个数应该是2* (1 *4)个。lstm内部kernel的参数个数也应该是 1 * (1 *4)个。但

1、fc_0参数,blstm_0.w_0,blstm_0.b_0和 与W_f,W_i,W_c,W_o的对应关系是?

2、gate_activation可以自定义吗?现在的文档里写必须是 [tanh ,identity ,relu ,sigmoid]中的一个?

guoshengCS commented 6 years ago

1、blstm_0.w_0的可以拆成4个大小为input_size * gate_size的weight,与lstm中四个大小为gate_size * gate_sizeweight分别对应,分别拼接起来即与W_c,W_i,W_f,W_o对应 2、 gate_activation目前只支持tanh ,identity ,relu ,sigmoid这四种

gaoyiyeah commented 5 years ago

镜像问题,GRU的对应顺序 @guoshengCS