danielegrattarola / spektral

Graph Neural Networks with Keras and Tensorflow 2.
https://graphneural.network
MIT License
2.37k stars 334 forks source link

model.evalute() reprodusibility problem #432

Open rmrmg opened 1 year ago

rmrmg commented 1 year ago

I tried to mimic this example https://github.com/danielegrattarola/spektral/blob/master/examples/node_prediction/citation_gcn.py with custom data (multiple graph and BatchLoader) and regression task, full python script in in attached file batchGCN.py.txt crucial part of the code:

    data = TestData(normalize_x=True, transforms=[LayerPreprocess(GCNConv)])
    idxs = numpy.random.permutation(len(data))
    pivot = int(0.8 * len(data))
    idx_tr, idx_te = numpy.split(idxs, [pivot, ])
    data_tr = data[idx_tr]
    data_test = data[idx_te]
    loader_tr = BatchLoader(data_tr)
    loader_test = BatchLoader(data_test)

    N = 100 #data.n_nodes  # Number of nodes in the graph
    F = data.n_node_features
    x_in = Input(shape=(F,))
    a_in = Input((N,), sparse=True)
    output = GCNConv(1, activation="relu", use_bias=False)([x_in, a_in])

    # Build model
    model = Model(inputs=[x_in, a_in], outputs=output)
    optimizer = Adam(learning_rate=0.003)
    model.compile(optimizer=optimizer, loss="mse", weighted_metrics=["acc"])
    model.summary()

    model.fit(loader_tr.load(), steps_per_epoch=loader_tr.steps_per_epoch,
              validation_data=loader_test.load(), validation_steps=loader_test.steps_per_epoch,
              epochs=epochs, callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)])

    for i in range(5):
        loss = model.evaluate(loader_test, steps=loader_test.steps_per_epoch)
        print("LOST", loss)
  1. I have question: what should be N in my case? I guess it should be at least of size of biggest graph and highest value means extra padding and nothing more than extra training time. Is my guess correct? What happen when N is smaller than number of node?

  2. and problem: Multiple run of the same model.evaluate() results in different result and it not depends on N (i.e. for any testes N, bigger or smaller than the biggest graph, the fluctuations are observed). Is this bug in my code or issue with spectral? epochs=20, N=10

    29/29 [==============================] - 1s 16ms/step - loss: 4.3498 - acc: 0.0000e+00
    LOST [4.349807262420654, 0.0]
    29/29 [==============================] - 0s 13ms/step - loss: 4.2849 - acc: 0.0000e+00
    LOST [4.284859657287598, 0.0]
    29/29 [==============================] - 1s 17ms/step - loss: 4.2611 - acc: 0.0000e+00
    LOST [4.26109504699707, 0.0]
    29/29 [==============================] - 0s 11ms/step - loss: 4.3629 - acc: 0.0000e+00
    LOST [4.3628973960876465, 0.0]
    29/29 [==============================] - 0s 13ms/step - loss: 4.2602 - acc: 0.0000e+00
    LOST [4.260205268859863, 0.0]

    epochs=20 N=100

    29/29 [==============================] - 1s 16ms/step - loss: 4.0418 - acc: 0.0000e+00
    LOST [4.041775226593018, 0.0]
    29/29 [==============================] - 0s 16ms/step - loss: 4.1152 - acc: 0.0000e+00
    LOST [4.115159511566162, 0.0]
    29/29 [==============================] - 1s 15ms/step - loss: 3.9473 - acc: 0.0000e+00
    LOST [3.947335720062256, 0.0]
    29/29 [==============================] - 0s 13ms/step - loss: 3.9188 - acc: 0.0000e+00
    LOST [3.9188196659088135, 0.0]
    29/29 [==============================] - 1s 16ms/step - loss: 3.9684 - acc: 0.0000e+00
    LOST [3.968449115753174, 0.0]
danielegrattarola commented 1 year ago

This is a better starting point for a batch-mode model, can you try adapting your code/model to this example instead? https://github.com/danielegrattarola/spektral/blob/master/examples/graph_prediction/qm9_ecc_batch.py

Cheers

rmrmg commented 1 year ago

Yes I can but not sure how to do this - what is the goal and what should be adapted... Here is my thoughts:

  1. I dont have edge properties hence ECCConv is rather questionable starting point but probably this is not the point, and I can take GCNConv instead.
  2. I qm9... model last layer is Dense(n_out) this is nice for global (aka graph-level) property but I want to learn node properties so it is rather not for me
  3. based on 1 and 2 I think I can stay with 1-layer GCNConv networks (at least for test purpose)
  4. masking, mask=True in Loader and then self.masking = GraphMasking() and x = self.masking(x) - this for sure helps (I have problem with trained model which predict 0 for all nodes - using mask probably solve this)

So can I change model definition (as 1 and 2) in example with qm9 and and my loader?

danielegrattarola commented 1 year ago

For 1 and 2, you don't have to use the same model, but I think it would be easier for you to start from that code since you were struggling with batch mode.

I also suggest using model subclassing as in the batch mode example, instead of the old functional API of Keras that you are using in your code.

Cheers

rmrmg commented 1 year ago

Thx for reply I did following

class GNN(Model):
    def __init__(self):
        super().__init__()
        self.masking = GraphMasking()
        self.conv1 = GCNConv(1, activation="relu")

    def call(self, inputs):
        x, a = inputs
        x = self.masking(x)
        output = self.conv1([x, a])
        return output

def train(model, learning_rate=1e-2, epochs=20):
    optimizer = Adam(learning_rate)
    model.compile(optimizer=optimizer, loss="mse")
    data = catalset.PDPData(normalize_x=True, transforms=[LayerPreprocess(GCNConv)])
    idxs = numpy.random.permutation(len(data))
    pivot = int(0.8 * len(data))
    idx_tr, idx_te = numpy.split(idxs, [pivot, ])
    data_tr = data[idx_tr]
    data_test = data[idx_te]
    batch_size = 10
    loader_tr = BatchLoader(data_tr, mask=False, batch_size=batch_size)
    loader_test = BatchLoader(data_test, mask=False, batch_size=batch_size)

    model.fit(loader_tr.load(), steps_per_epoch=loader_tr.steps_per_epoch, epochs=epochs)
    print("Testing model")
    model.fit(loader_tr.load(), steps_per_epoch=loader_tr.steps_per_epoch, epochs=epochs)
    print("Testing model")
    loss = model.evaluate(loader_test.load(), steps=loader_test.steps_per_epoch)
    print("Done. Test loss: {}".format(loss))

if __name__ == "__main__":
    model = GNN()
    train(model)

and this end up with

 File "/home/rmrmg/anaconda3/envs/alfabet/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 98, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

This because loader.load() returns: [(<class 'numpy.ndarray'>, (10, 50, 6), dtype('float32')), (<class 'numpy.ndarray'>, (10, 50, 50), dtype('float64'))] (<class 'numpy.ndarray'> (10,) object) ] or in plain-english second element (Y) is of dtype object. In PDPData read() returns [graphs.append(Graph(x=x, a=csr_matrix(a), y=y)) for (x, a, y) in full_data] where y is of shape (N, 1) where N in number of nodes in graph (I tried also (N, ) but effect was the same). Two years ago you wrote here BatchLoader only supports graph-level labels (meaning that labels do not get zero-padded -- that would not make sense) so all labels should have the same shape So I changed in code presented above BatchLoader to DisjointLoader and model class to

class GNN(Model):
    def __init__(self):
        super().__init__()
        self.conv1 = GCNConv(1, activation="relu")

    def call(self, inputs):
        x, a, _ = inputs
        output = self.conv1([x, a])
        return output

and I got I think somehow similar problem cause by dtype=object of Y - both version (when y in Graph constructor is (N, ) and second when shape is (N, 1) ) of errors are attached. error_N.1.txt error_N.txt I am lost and think the project need more documentation.

danielegrattarola commented 1 year ago

If the labels have dtype object it likely means that they cannot be stacked, this is typical numpy behavior. Have you checked the contents of y and made sure that all of them have the same size?

Anyway, the loaders are there to simplify users' lives but if they become a problem you can always write your data loading pipeline from scratch so that you have full control over it. Writing a training loop in TF is pretty easy nowadays, there's an example here.

The issue you mentioned is no longer relevant, and as you see in the documentation:

If node_level=False, the labels are interpreted as graph-level labels and are returned as an array of shape [batch, n_labels]. If node_level=True, then the labels are padded along the node dimension and are returned as an array of shape [batch, n_max, n_labels].

Cheers