TobiasLee / MTA-LSTM-TensorFlow

TensorFlow reimplementation of Topic-to-Essay Generation with Neural Networks.
https://tobiaslee.top/2018/11/02/customized-RNN-cell/
70 stars 24 forks source link

Question about the atten_sum in origianl version of hit . #5

Closed caoxu915683474 closed 5 years ago

caoxu915683474 commented 5 years ago

I reviewed the code of hit, I found that in their version, they have use atten_sum vector and also use 0.1 * tf.reduce_sum((phi_res - atten_sum) ** 2) this loss is added to the sequence_loss

I use a vector atten_sum as a param to MTA_Wapper(...) and in the init () of wapper use the atten_sum to assign self.atten_sum of MTA_Wapper, then use self.atten_sum to add score in the call function but I meet a problem the atten_sum doesn't change, I don't know why.

self.atten_sum = tf.ones([hp.batch_size, hp.source_max_length], dtype=tf.float32) * 0.0001

                    if hp.mode == "train":

                        print("mta-lstm training...")

                        self.initial_state = self.decoder_cell.zero_state(batch_size=hp.batch_size, dtype=tf.float32)
                        if hp.scheduled_sampling:
                            print("using scheduled_sampling...")
                            helper_pt = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(inputs=self.decoder_input_embedded,
                                                                                            sequence_length=[hp.target_max_length] * hp.batch_size,
                                                                                            embedding=self.embedding,
                                                                                            sampling_probability=hp.sampling_prob)
                        else:
                            helper_pt = tf.contrib.seq2seq.TrainingHelper(inputs=self.decode_input_embedded,
                                                                           sequence_length=[hp.target_max_length] * hp.batch_size,
                                                                           time_major=False)

                        training_cell = MTAWrapper(hp, self.decoder_cell, self.topic_encoding,
                                                   self.v, self.uf, self.atten_sum, self.query_layer,
                                                   self.memory_layer, mask=masks, topic_mask=topic_mask)
                        decoder_pt = tf.contrib.seq2seq.BasicDecoder(cell=training_cell,
                                                                     helper=helper_pt,
                                                                     initial_state=self.initial_state,
                                                                     output_layer=self.output_layer)
class MTAWrapper(RNNCell):

    def __init__(self, hp, cell, memory, v, uf, atten_sum,
                 query_layer, memory_layer,
                 mask=None, topic_mask=None, max_len=100,
                 attention_size=128,
                 state_is_tuple=True):

        """ Multi-Topic aware wrapper of LSTM

        Args:

          cell: an RNNCell, a projection of inputs is added before it.
          memory: topic embedding of topic words
          mask: seq_len_mask

        Raises:

          TypeError: if cell is not an RNNCell

        """
        if not isinstance(cell, RNNCell):
            raise TypeError("The parameter cell is not RNNCell.")
        self._cell = cell

        if hp.beam_search:

            memory = tf.contrib.seq2seq.tile_batch(memory, hp.beam_size)
            mask = tf.contrib.seq2seq.tile_batch(mask, hp.beam_size)
            atten_sum = tf.contrib.seq2seq.tile_batch(atten_sum, hp.beam_search)

        self.memory = memory
        self._state_is_tuple = state_is_tuple
        self.attention_size = attention_size
        self.topic_mask = topic_mask
        self.batch_size = self.memory.shape[0].value
        self.num_keywords = self.memory.shape[1].value
        self.embedding_size = self.memory.shape[2].value
        self.coverage_vector = array_ops.ones([self.batch_size, self.num_keywords])
        self.atten_sum = atten_sum

        if mask is None:
            self.seq_len = array_ops.ones([self.batch_size, 1]) # inference
        else:
            self.seq_len = math_ops.reduce_sum(mask, axis=1, keep_dims=True) # training

        self.v = v
        self.query_layer = query_layer
        self.memory_layer = memory_layer

        self.u_f = uf
        res1 = math_ops.sigmoid(math_ops.matmul(array_ops.reshape(self.memory, [self.batch_size, -1]), self.u_f)) # batch_size * num_keyword

        self.phi_res = self.seq_len * res1 * 0.001 # batch_size * num_keywords

        print(self.u_f)

    @property
    def state_size(self):
        return self._cell.state_size

    @property
    def output_size(self):
        return self._cell.output_size

    def __call__(self, inputs, state, scope=None):

        c_t, h_t = state
        dtype = inputs.dtype

        with tf.variable_scope("topic_attention"):

            # Attention
            keys = self.memory_layer(self.memory)
            processed_query = tf.expand_dims(self.query_layer(h_t), 1)
            score = self.coverage_vector * tf.reduce_sum(self.v * tf.tanh(keys + processed_query), [2])
            padding = tf.ones_like(score) * (-2 ** 32 + 1)
            score = tf.where(tf.equal(self.topic_mask, False), padding, score)
            score = tf.nn.softmax(score, axis=1)
            score_tile = tf.tile(tf.expand_dims(score, -1), [1, 1, self.embedding_size], name="weight")

            mt = tf.reduce_mean(self.memory * score_tile, axis=1)

            self.atten_sum = tf.add(self.atten_sum, score)

            # update coverage vector
            self.coverage_vector = self.coverage_vector - score / self.phi_res
        return self._cell(tf.concat([inputs, mt], axis=1), state)
TobiasLee commented 5 years ago

What does the atten_sum doesn't change mean? Can you provide more details? It seems that you want to add an extra term to the sequence loss, an important hint I guess might be helpful is that a variable using in the __call__ is not fetchable, due to the limitation of TensorFlow.

caoxu915683474 commented 5 years ago

谢谢回复! 你好,我用中文来说吧,我怕英文描述不清。。。。

就是我创建了一个atten_sum变量在模型中就像uf 和 v 向量一样传入wrapper里面,然后在wrapper里的call()函数加score, 然后我在模型中,用sess run 模型中的atten_sum发现一直是0, 但是按理说它被传入wrapper并且被加了socre, 滚完一轮数据应该会改变才对。。。。难道我传入的atten_sum进行加了,但模型中的atten_sum没有变,但我打印了传入前和传入后的atten_sum变量发现都是指向同一内存空间上。。。

TobiasLee commented 5 years ago

add 应该会返回一个新的 Tensor,为什么会指向同一个内存?这有点奇怪。

caoxu915683474 commented 5 years ago

额,我是在add 前打印的(蠢了。。。),所以self.atten_sum = tf.add(self.atten_sum, score), 这里的self.atten_sum和传入之前那个atten_sum不是同一个了是么?大神,有没有可以让add 直接加到同内存下的atten_sum上的方法?

TobiasLee commented 5 years ago

我记得有就地(in-place)加法的,可以试试;实在不行用 assign 给 self.attn_sum 赋值也可以。

caoxu915683474 commented 5 years ago

好的,谢谢帮助!!我试试