Attention machanism is not different from cited paper.

ZephyrChenzf / SF-ID-Network-For-NLU

This is the source implementation of ACL2019 accepted paper: A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling(https://www.aclweb.org/anthology/P19-1544).

Apache License 2.0

103 stars 38 forks source link

attn_size = state_shape[2].value origin_shape = tf.shape(state_outputs) hidden = tf.expand_dims(state_outputs, 1) hidden_conv = tf.expand_dims(state_outputs, 2) k = tf.get_variable("AttnW", [1, 1, attn_size, attn_size]) hidden_features = tf.nn.conv2d(hidden_conv, k, [1, 1, 1, 1], "SAME") hidden_features = tf.reshape(hidden_features, origin_shape) hidden_features = tf.expand_dims(hidden_features, 1) v = tf.get_variable("AttnV", [attn_size]) slot_inputs_shape = tf.shape(slot_inputs) slot_inputs = tf.reshape(slot_inputs, [-1, attn_size]) y = core_rnn_cell._linear(slot_inputs, attn_size, True) y = tf.reshape(y, slot_inputs_shape) y = tf.expand_dims(y, 2) s = tf.reduce_sum(v * tf.tanh(hidden_features + y), [3]) a = tf.nn.softmax(s) a = tf.expand_dims(a, -1) slot_d = tf.reduce_sum(a * hidden, [2]) slot_reinforce_state = tf.expand_dims(slot_d, 2)

I have find some example code.

class SlotAttention(nn.Module):
    def __init__(self, n_features=64):
        super(SlotAttention, self).__init__()
        self.attention = nn.Linear(n_features, n_features)

    def forward(self, x):
        """
        :param x: hidden states of LSTM (batch_size, seq_len, hidden_size)
        :return: slot attention vector of size (batch_size, seq_len, hidden_size)

        attention = softmax(x * linear(x)) * x

        """
        weights = self.attention(x)  # (batch_size, seq_len, hidden_size) - temporary weight
        weights = torch.matmul(weights, torch.transpose(x, 1, 2))  # (batch_size, hidden_size, hidden_size) - att matrix
        weights = F.softmax(weights, dim=2)
        output = torch.matmul(weights, x)
        return output

class IntentAttention(nn.Module):
    def __init__(self, n_features=64):
        super(IntentAttention, self).__init__()
        self.attention = nn.Linear(n_features, n_features)

    def forward(self, x):
        """

        :param x: hidden states of LSTM (batch_size, seq_len, hidden_size)
        :return: intent vector of size (batch_size, hidden_size)
        """
        weights = self.attention(x)  # (batch_size, seq_len, hidden_size) - temporary weight
        # output = torch.matmul(x, weights)
        weights = torch.matmul(weights, torch.transpose(x, 1, 2))  # (batch_size, seq_len, seq_len) - att matrix
        weights = F.softmax(weights, dim=2)
        output = torch.matmul(weights, x)
        output = torch.sum(output, 1)
        return output

But this computation logic is different from cited paper.

ZephyrChenzf / SF-ID-Network-For-NLU

Attention machanism is not different from cited paper. #12