ZephyrChenzf / SF-ID-Network-For-NLU

This is the source implementation of ACL2019 accepted paper: A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling(https://www.aclweb.org/anthology/P19-1544).
Apache License 2.0
103 stars 38 forks source link

Attention machanism is not different from cited paper. #12

Open wj-Mcat opened 4 years ago

wj-Mcat commented 4 years ago

The cited paper attention mechanism is : image image

But, your attention mechanism is so simple: https://github.com/ZephyrChenzf/SF-ID-Network-For-NLU/blob/67f0bc7339d007d48f3c2d64ba41c8b0d668cea2/train.py#L113

attn_size = state_shape[2].value
origin_shape = tf.shape(state_outputs)
hidden = tf.expand_dims(state_outputs, 1)
hidden_conv = tf.expand_dims(state_outputs, 2)
k = tf.get_variable("AttnW", [1, 1, attn_size, attn_size])
hidden_features = tf.nn.conv2d(hidden_conv, k, [1, 1, 1, 1], "SAME")
hidden_features = tf.reshape(hidden_features, origin_shape)
hidden_features = tf.expand_dims(hidden_features, 1)
v = tf.get_variable("AttnV", [attn_size])
slot_inputs_shape = tf.shape(slot_inputs)
slot_inputs = tf.reshape(slot_inputs, [-1, attn_size])
y = core_rnn_cell._linear(slot_inputs, attn_size, True)
y = tf.reshape(y, slot_inputs_shape)
y = tf.expand_dims(y, 2)
s = tf.reduce_sum(v * tf.tanh(hidden_features + y), [3])
a = tf.nn.softmax(s)
a = tf.expand_dims(a, -1)
slot_d = tf.reduce_sum(a * hidden, [2])
slot_reinforce_state = tf.expand_dims(slot_d, 2)

I can't find out SlotAttention and IntentAttention code. Any one to help me?

wj-Mcat commented 4 years ago

I have find some example code.

class SlotAttention(nn.Module):
    def __init__(self, n_features=64):
        super(SlotAttention, self).__init__()
        self.attention = nn.Linear(n_features, n_features)

    def forward(self, x):
        """
        :param x: hidden states of LSTM (batch_size, seq_len, hidden_size)
        :return: slot attention vector of size (batch_size, seq_len, hidden_size)

        attention = softmax(x * linear(x)) * x

        """
        weights = self.attention(x)  # (batch_size, seq_len, hidden_size) - temporary weight
        weights = torch.matmul(weights, torch.transpose(x, 1, 2))  # (batch_size, hidden_size, hidden_size) - att matrix
        weights = F.softmax(weights, dim=2)
        output = torch.matmul(weights, x)
        return output

class IntentAttention(nn.Module):
    def __init__(self, n_features=64):
        super(IntentAttention, self).__init__()
        self.attention = nn.Linear(n_features, n_features)

    def forward(self, x):
        """

        :param x: hidden states of LSTM (batch_size, seq_len, hidden_size)
        :return: intent vector of size (batch_size, hidden_size)
        """
        weights = self.attention(x)  # (batch_size, seq_len, hidden_size) - temporary weight
        # output = torch.matmul(x, weights)
        weights = torch.matmul(weights, torch.transpose(x, 1, 2))  # (batch_size, seq_len, seq_len) - att matrix
        weights = F.softmax(weights, dim=2)
        output = torch.matmul(weights, x)
        output = torch.sum(output, 1)
        return output

But this computation logic is different from cited paper.