chrisvdweth / ml-toolkit

63 stars 23 forks source link

Possible dimensional mismatch ? #1

Open rajesh-gupta-14 opened 4 years ago

rajesh-gupta-14 commented 4 years ago

Hi Chris

In one of your classifiers rnn.py, I found a possible bug, I might be wrong but just fyi.

In your forward() function:

""" if True: if self.rnn_type == 'gru': final_state = self.hidden.view(self.num_layers, self.directions_count, X_sorted.shape[0], self.rnn_hidden_dim)[-1] elif self.rnn_type == 'lstm': final_state = self.hidden[0].view(self.num_layers, self.directions_count, X_sorted.shape[0], self.rnn_hidden_dim)[-1] else: raise Exception('Unknown rnn_type. Valid options: "gru", "lstm"')

        if self.directions_count == 1:
            X = final_state.squeeze()
        elif self.directions_count == 2:
            h_1, h_2 = final_state[0], final_state[1]
            #X = h_1 + h_2                # Add both states (requires changes to the input size of first linear layer)
            X = torch.cat((h_1, h_2), 1)  # Concatenate both states

"""

If the rnn type is LSTM with 2 layers and is bidirectional, shouldn't it be:

h_1, h_2 = torch.cat( ( final_state[-1][0], final_state[-1][1] ) , 1 )

chrisvdweth commented 4 years ago

Rajesh, thanks for your feedback! There's always a good chance that something's off. I use PyTorch, but more as a means to an end for my research. I use ML, but not a ML researcher.

Your example torch.cat( ( final_state[-1][0], final_state[-1][1] ) , 1 ) seems to do the same as my code, just in one line. The [-1] to get the last layer I do directly after the view() command, and the [0] and [1] to get both directions I do at h_1, h_2 = final_state[0], final_state[1]

I'm only more verbose for readability and to try different ways to combine the two directions. Otherwise, I cannot see an immediate difference.