apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Attempting to write a relation network in Gluon #8625

Open anjishnu opened 7 years ago

anjishnu commented 7 years ago

I was trying to implement a Relational Network: https://arxiv.org/abs/1706.01427 It seems to be related to using for loops within the forward pass, which I assumed would be supported since the API is similar to pytorch

Implementation is below

class RecurrentRelational(gluon.Block):
    def __init__(self, dim=100, num_layers=1, layout='TNC',
                **kwargs):
        super(RecurrentRelational, self).__init__(**kwargs)
        self.key = 'recurrent-relational'
        with self.name_scope():
            # layers created in name_scope will inherit name space
            # from parent layer.
            #self.dropout = nn.Dropout(0.3)
            self.hidden_size = dim
            self.num_layers = num_layers
            self.layout = layout
            self.bn = nn.BatchNorm()

            # Recurrent Encoder
            self.rnn = rnn.RNN(self.hidden_size, self.num_layers,
                                layout=self.layout, bidirectional=True)
            # Relational Network
            self.g_hidden = 100
            self.relational_g1 = nn.Dense(self.g_hidden, activation='relu')
            self.relational_g2 = nn.Dense(self.g_hidden, activation='relu')

            self.relational_f = nn.Dense(100, activation='relu')
            # End RN

            self.binary = nn.Dense(2)

    def activate_relation(self, relation_vector):
        g_z = self.relational_g1(relation_vector)
        g_z = self.bn(g_z)
        g_z = self.relational_g2(g_z)
        return g_z

    def activate_aggregation(self, aggregation):
        return self.relational_f(self.bn(aggregation))

    def forward(self, (x1, x2)):
        z1 = self.rnn(x1)
        z2 = self.rnn(x2)
        batch_size, seq_len, hidden_dim = z1.shape
        num_objects = z1.shape[1]
        all_object_pairs = []

        for i in range(num_objects):
            first_object = z1[:, i, :]
            for j in range(num_objects):
                second_object = z2[:, j, :]
                relation_vector = mx.nd.concat(first_object, second_object, dim=1)
                all_object_pairs.append(relation_vector)

        all_relations = mx.nd.concat(*all_object_pairs, dim=0)
        z_rel = self.activate_relation(all_relations).reshape((-1, num_objects * num_objects,
                                                           self.g_hidden))
        z_agg = mx.nd.sum(z_rel, axis=1)
        return self.binary(self.activate_aggregation(z_agg))

The error I'm getting is

libc++abi.dylib: terminating with uncaught exception of type dmlc::Error: [16:57:57] src/engine/./threaded_engine.h:347: [16:57:57] src/operator/tensor/./matrix_op-inl.h:964: CropAssign only supports kWriteTo

Is there a different way to implement this that may avoid this issue?

I guess I essentially need to do the equivalent of the code below, but with the all_relations array being a memory view of the original array rather than a copy, does anyone know of a good tutorial or example of how to implement this with the NDArray API?

        num_relations = num_objects * num_objects
        #all_relations = []
        all_relations = mx.nd.zeros((batch_size * num_relations, hidden_dim * 2))
        for i in range(num_objects):
            first_object = z1[:, i, :]
            for j in range(num_objects):
                second_object = z2[:, j, :]
                relation_vector = mx.nd.concat(first_object, second_object, dim=1)
                start_index = ((i * num_objects) + j) * batch_size
                #all_relations.append(relation_vector)
                all_relations[start_index : start_index + batch_size] = relation_vector
eric-haibin-lin commented 7 years ago

@reminisce can you help take a look at this issue?

anjishnu commented 7 years ago

Relevant thread : https://discuss.mxnet.io/t/cross-product-style-architectures-with-gluon/271/3

The model doesn't throw an exception on the latest mainline branch built from source, but I haven't gotten the network produce anything other than the same prediction for every sample.

And it throws an exception if I try to apply batch-normalization to stabilize training.