danielegrattarola / spektral

Graph Neural Networks with Keras and Tensorflow 2.
https://graphneural.network
MIT License
2.37k stars 334 forks source link

On the learning stability of the results of the general_gnn.py #256

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hi Daniele and Jack,

I have a question regarding this example. In particular, running the code yields the following file's content: data.txt.

However, if one plots the data above, say,

import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mlp
from numpy import minimum, maximum

test_acc, epoch = [], []
for line in open('data.txt', 'r'):
  values = [s for s in line.split()]
  epoch.append(values[1]) 
  test_acc.append(float(values[15]))

plt.figure(figsize=(20,5))
plt.plot(epoch, test_acc)
plt.xlabel('Epoch')
plt.ylabel('Test accuracy')
plt.legend(["test_acc"], loc ="lower right")

plt.show()

there is no convergence and stability in terms of the test accuracy:

ffffff

Here test_acc severely oscillates. Can you please explain how this learning process is considered to be normal while the accuracy of the model does not overall increase from one epoch to another?

danielegrattarola commented 3 years ago

Hi,

this is a known issue of the TUDatasets benchmarks, most methods will have a crazy high variance in the performance. That script is meant to give a general template for building your own scripts, I wouldn't use it to report results in a paper (unless used as part of a suitable cross-validation pipeline).

If you replace the dataset with one of the OGB benchmarks, the results should stabilize. Please let me know if this is not the case, as it might be a problem with the code.

Cheers

ghost commented 3 years ago

Thanks for the assertion.

I just tried to test a simple spektral sample by an OGB-based dataset. Yet, it seems my setup is problematic. In particular, using the library-agnostic loader of ogb the following snippet

mport numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.optimizers import Adam

from spektral.data import DisjointLoader
from spektral.models import GeneralGNN

from ogb.nodeproppred import NodePropPredDataset

dataset = NodePropPredDataset("ogbn-proteins")
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx["train"], split_idx["valid"], split_idx["test"]

np.random.seed(0)

batch_size = 16
learning_rate = 0.0001
epochs = 100

loader_tr = DisjointLoader(train_idx, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(test_idx, batch_size=batch_size)

model = GeneralGNN(dataset.labels, activation="softmax")

optimizer = Adam(learning_rate)
loss_fn = CategoricalCrossentropy()
model.compile(loss=loss_fn,
              optimizer=optimizer,
              metrics=categorical_accuracy)

history = model.fit(loader_tr.load(), steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)

plt.plot(history.history['loss'])
plt.plot(history.history['categorical_accuracy'])
plt.xlabel('epoch')
plt.legend(["Loss", "Categorical Accuracy"])
plt.show()

yields the error below:

Traceback (most recent call last):
  File "C:/Users/Matinking/PycharmProjects/RL/GNN_spektral_OGB.py", line 11, in <module>
    from ogb.nodeproppred import NodePropPredDataset
  File "C:\Users\Matinking\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\__init__.py", line 2, in <module>
    from .dataset import NodePropPredDataset
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\dataset.py", line 5, in <module>
    from ogb.io.read_graph_raw import read_csv_graph_raw, read_csv_heterograph_raw,\
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\io\__init__.py", line 1, in <module>
    from .save_dataset import DatasetSaver
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\io\save_dataset.py", line 1, in <module>
    import torch
ModuleNotFoundError: No module named 'torch'

Have you ever succeeded to use ogb datasets in spektral by tensorflow backend, not torch?

danielegrattarola commented 3 years ago

To load an OGB dataset you need torch installed, I don't think there's a way around that, unfortunately.

Cheers

ghost commented 3 years ago

According to the OGB developers' advice, it seems that if torch is installed, one can even use OGB datasets by tensorflow as a backend. However, the integration of OGB datasets with spektral seems to be problematic. As you can see the discussion here, the OGB developers have no clear clue on why one has errors in the course of feeding OGB datasets to spektral models. Here are two examples and their corresponding errors:

Example 1: using ogbn-proteins

mport numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.optimizers import Adam

from spektral.data import DisjointLoader
from spektral.models import GeneralGNN

from ogb.nodeproppred import NodePropPredDataset

dataset = NodePropPredDataset("ogbn-proteins")
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx["train"], split_idx["valid"], split_idx["test"]

np.random.seed(0)

batch_size = 16
learning_rate = 0.0001
epochs = 100

loader_tr = DisjointLoader(train_idx, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(test_idx, batch_size=batch_size)

model = GeneralGNN(dataset.labels, activation="softmax")

optimizer = Adam(learning_rate)
loss_fn = CategoricalCrossentropy()
model.compile(loss=loss_fn,
              optimizer=optimizer,
              metrics=categorical_accuracy)

history = model.fit(loader_tr.load(), steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)

plt.plot(history.history['loss'])
plt.plot(history.history['categorical_accuracy'])
plt.xlabel('epoch')
plt.legend(["Loss", "Categorical Accuracy"])
plt.show()

Error:

Using backend: pytorch
Traceback (most recent call last):
  File "~/PycharmProjects/RL/GNN_spektral_OGB.py", line 13, in <module>
    dataset = NodePropPredDataset("ogbn-proteins")
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\dataset.py", line 63, in __init__
    self.pre_process()
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\dataset.py", line 70, in pre_process
    loaded_dict = torch.load(pre_processed_file_path)
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\serialization.py", line 608, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\serialization.py", line 777, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

Example 2: using ogbg-molhiv

import numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.optimizers import Adam

from spektral.data import DisjointLoader
from spektral.models import GeneralGNN

from ogb.graphproppred import GraphPropPredDataset

dataset = GraphPropPredDataset(name="ogbg-molhiv")

split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx["train"], split_idx["valid"], split_idx["test"]

np.random.seed(0)

batch_size = 16
learning_rate = 0.0001
epochs = 100

loader_tr = DisjointLoader(train_idx, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(test_idx, batch_size=batch_size)

model = GeneralGNN(dataset.labels, activation="softmax")

optimizer = Adam(learning_rate)
loss_fn = CategoricalCrossentropy()
model.compile(loss=loss_fn,
              optimizer=optimizer,
              metrics=categorical_accuracy)

history = model.fit(loader_tr.load(), steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)

plt.plot(history.history['loss'])
plt.plot(history.history['categorical_accuracy'])
plt.xlabel('epoch')
plt.legend(["Loss", "Categorical Accuracy"])
plt.show()

Error:

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1664, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "~/PycharmProjects/RL/GNN_spektral_OGB.py", line 27, in <module>
    model = GeneralGNN(dataset.labels, activation="softmax")
  File "~AppData\Local\Programs\Python\Python37\lib\site-packages\spektral\models\general_gnn.py", line 158, in __init__
    activation,
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\spektral\models\general_gnn.py", line 216, in __init__
    self.mlp.add(Dense(hidden if i < layers - 1 else output))
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\layers\core.py", line 1166, in __init__
    self.units = int(units) if not isinstance(units, int) else units
TypeError: only size-1 arrays can be converted to Python scalars

Thus, can you please share a working example of an spektral model in which an OGB dataset is used?

danielegrattarola commented 3 years ago

Hi,

the first error is entirely OGB code, while the second error is probably due to the way you are passing dataset.labels as number of output units when creating the model (dataset.labels is an array of labels, whereas the class expects an integer -- you can have a look at the documentation to know what each class/method in Spektral expects as input: https://graphneural.network/models/#generalgnn)

Also, I should note that :

  1. You are using the DisjointLoader in an unexpected way (you're giving "indices" as input to something that experts a spektral.data.Dataset object)
  2. Spektral does not support OGB datasets directly, you have to wrap them in a specific loader for OGB. Have you looked at these examples?

Cheers

ghost commented 3 years ago

Thanks. So, I ended up with the following code:

import numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.optimizers import Adam

from spektral.data import DisjointLoader
from spektral.models import GeneralGNN

from spektral.datasets.ogb import OGB
from ogb.graphproppred import GraphPropPredDataset

ogb_dataset = GraphPropPredDataset(name="ogbg-molhiv")
dataset = OGB(ogb_dataset)

idx = ogb_dataset.get_idx_split()
idx_tr, idx_va, idx_te = idx["train"], idx["valid"], idx["test"]

dataset_tr = dataset[idx_tr]
dataset_va = dataset[idx_va]
dataset_te = dataset[idx_te]

np.random.seed(0)

batch_size = 16
learning_rate = 0.0001
epochs = 100

loader_tr = DisjointLoader(dataset_tr, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(dataset_te, batch_size=batch_size, epochs=1)

model = GeneralGNN(dataset.n_labels, activation="softmax")

optimizer = Adam(learning_rate)
loss_fn = CategoricalCrossentropy()
model.compile(loss=loss_fn,
              optimizer=optimizer,
              metrics=categorical_accuracy)

history = model.fit(loader_tr.load(), steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)

and I get the following error:

Traceback (most recent call last):
  File "~/PycharmProjects/RL/test_OGB_spektral.py", line 80, in <module>
    history = model.fit(loader_tr.load(), steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\training.py", line 1183, in fit
    tmp_logs = self.train_function(iterator)
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\def_function.py", line 933, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\def_function.py", line 764, in _initialize
    *args, **kwds))
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\function.py", line 3050, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\function.py", line 3444, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\function.py", line 3289, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\func_graph.py", line 999, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\def_function.py", line 672, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\func_graph.py", line 986, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    ~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\training.py:855 train_function  *
        return step_function(self, iterator)
    ~\AppData\Local\Programs\Python\Python37\lib\site-packages\spektral\models\general_gnn.py:166 call  *
        x, a, i = inputs

    ValueError: too many values to unpack (expected 3)

The problem here is that in the examples you supplied, the models are created based on three values dataset.n_node_features, dataset.n_edge_features and dataset.n_labels . However, I can only dataset.n_labels as the output to the GeneralGNN constructor. If that's the case, can you please explain how I can feed those values to GeneralGNN?

danielegrattarola commented 3 years ago

The isssue here is that "molhiv" is a dataset that has edge attributes, but GeneralGNN expects only node attributes (x, a, i).

You can either change dataset or implement a model similar to GeneralGNN which is designed to discard edge attributes. Something like:

class MyGeneralGNN(GeneralGNN):
    def call(self, inputs):
        x, a, e, i = inputs
        return super().call([x,a, i])