Closed ghost closed 3 years ago
Hi,
this is a known issue of the TUDatasets benchmarks, most methods will have a crazy high variance in the performance. That script is meant to give a general template for building your own scripts, I wouldn't use it to report results in a paper (unless used as part of a suitable cross-validation pipeline).
If you replace the dataset with one of the OGB benchmarks, the results should stabilize. Please let me know if this is not the case, as it might be a problem with the code.
Cheers
Thanks for the assertion.
I just tried to test a simple spektral sample by an OGB-based dataset. Yet, it seems my setup is problematic. In particular, using the library-agnostic loader of ogb the following snippet
mport numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.optimizers import Adam
from spektral.data import DisjointLoader
from spektral.models import GeneralGNN
from ogb.nodeproppred import NodePropPredDataset
dataset = NodePropPredDataset("ogbn-proteins")
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx["train"], split_idx["valid"], split_idx["test"]
np.random.seed(0)
batch_size = 16
learning_rate = 0.0001
epochs = 100
loader_tr = DisjointLoader(train_idx, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(test_idx, batch_size=batch_size)
model = GeneralGNN(dataset.labels, activation="softmax")
optimizer = Adam(learning_rate)
loss_fn = CategoricalCrossentropy()
model.compile(loss=loss_fn,
optimizer=optimizer,
metrics=categorical_accuracy)
history = model.fit(loader_tr.load(), steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)
plt.plot(history.history['loss'])
plt.plot(history.history['categorical_accuracy'])
plt.xlabel('epoch')
plt.legend(["Loss", "Categorical Accuracy"])
plt.show()
yields the error below:
Traceback (most recent call last):
File "C:/Users/Matinking/PycharmProjects/RL/GNN_spektral_OGB.py", line 11, in <module>
from ogb.nodeproppred import NodePropPredDataset
File "C:\Users\Matinking\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\__init__.py", line 2, in <module>
from .dataset import NodePropPredDataset
File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\dataset.py", line 5, in <module>
from ogb.io.read_graph_raw import read_csv_graph_raw, read_csv_heterograph_raw,\
File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\io\__init__.py", line 1, in <module>
from .save_dataset import DatasetSaver
File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\io\save_dataset.py", line 1, in <module>
import torch
ModuleNotFoundError: No module named 'torch'
Have you ever succeeded to use ogb datasets in spektral by tensorflow backend, not torch?
To load an OGB dataset you need torch installed, I don't think there's a way around that, unfortunately.
Cheers
According to the OGB developers' advice, it seems that if torch is installed, one can even use OGB datasets by tensorflow as a backend. However, the integration of OGB datasets with spektral seems to be problematic. As you can see the discussion here, the OGB developers have no clear clue on why one has errors in the course of feeding OGB datasets to spektral models. Here are two examples and their corresponding errors:
Example 1: using ogbn-proteins
mport numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.optimizers import Adam
from spektral.data import DisjointLoader
from spektral.models import GeneralGNN
from ogb.nodeproppred import NodePropPredDataset
dataset = NodePropPredDataset("ogbn-proteins")
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx["train"], split_idx["valid"], split_idx["test"]
np.random.seed(0)
batch_size = 16
learning_rate = 0.0001
epochs = 100
loader_tr = DisjointLoader(train_idx, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(test_idx, batch_size=batch_size)
model = GeneralGNN(dataset.labels, activation="softmax")
optimizer = Adam(learning_rate)
loss_fn = CategoricalCrossentropy()
model.compile(loss=loss_fn,
optimizer=optimizer,
metrics=categorical_accuracy)
history = model.fit(loader_tr.load(), steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)
plt.plot(history.history['loss'])
plt.plot(history.history['categorical_accuracy'])
plt.xlabel('epoch')
plt.legend(["Loss", "Categorical Accuracy"])
plt.show()
Error:
Using backend: pytorch
Traceback (most recent call last):
File "~/PycharmProjects/RL/GNN_spektral_OGB.py", line 13, in <module>
dataset = NodePropPredDataset("ogbn-proteins")
File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\dataset.py", line 63, in __init__
self.pre_process()
File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\dataset.py", line 70, in pre_process
loaded_dict = torch.load(pre_processed_file_path)
File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\serialization.py", line 777, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input
Example 2: using ogbg-molhiv
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.optimizers import Adam
from spektral.data import DisjointLoader
from spektral.models import GeneralGNN
from ogb.graphproppred import GraphPropPredDataset
dataset = GraphPropPredDataset(name="ogbg-molhiv")
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx["train"], split_idx["valid"], split_idx["test"]
np.random.seed(0)
batch_size = 16
learning_rate = 0.0001
epochs = 100
loader_tr = DisjointLoader(train_idx, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(test_idx, batch_size=batch_size)
model = GeneralGNN(dataset.labels, activation="softmax")
optimizer = Adam(learning_rate)
loss_fn = CategoricalCrossentropy()
model.compile(loss=loss_fn,
optimizer=optimizer,
metrics=categorical_accuracy)
history = model.fit(loader_tr.load(), steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)
plt.plot(history.history['loss'])
plt.plot(history.history['categorical_accuracy'])
plt.xlabel('epoch')
plt.legend(["Loss", "Categorical Accuracy"])
plt.show()
Error:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1664, in <module>
main()
File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "~/PycharmProjects/RL/GNN_spektral_OGB.py", line 27, in <module>
model = GeneralGNN(dataset.labels, activation="softmax")
File "~AppData\Local\Programs\Python\Python37\lib\site-packages\spektral\models\general_gnn.py", line 158, in __init__
activation,
File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\spektral\models\general_gnn.py", line 216, in __init__
self.mlp.add(Dense(hidden if i < layers - 1 else output))
File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\layers\core.py", line 1166, in __init__
self.units = int(units) if not isinstance(units, int) else units
TypeError: only size-1 arrays can be converted to Python scalars
Thus, can you please share a working example of an spektral model in which an OGB dataset is used?
Hi,
the first error is entirely OGB code, while the second error is probably due to the way you are passing dataset.labels
as number of output units when creating the model (dataset.labels
is an array of labels, whereas the class expects an integer -- you can have a look at the documentation to know what each class/method in Spektral expects as input: https://graphneural.network/models/#generalgnn)
Also, I should note that :
spektral.data.Dataset
object)Cheers
Thanks. So, I ended up with the following code:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.optimizers import Adam
from spektral.data import DisjointLoader
from spektral.models import GeneralGNN
from spektral.datasets.ogb import OGB
from ogb.graphproppred import GraphPropPredDataset
ogb_dataset = GraphPropPredDataset(name="ogbg-molhiv")
dataset = OGB(ogb_dataset)
idx = ogb_dataset.get_idx_split()
idx_tr, idx_va, idx_te = idx["train"], idx["valid"], idx["test"]
dataset_tr = dataset[idx_tr]
dataset_va = dataset[idx_va]
dataset_te = dataset[idx_te]
np.random.seed(0)
batch_size = 16
learning_rate = 0.0001
epochs = 100
loader_tr = DisjointLoader(dataset_tr, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(dataset_te, batch_size=batch_size, epochs=1)
model = GeneralGNN(dataset.n_labels, activation="softmax")
optimizer = Adam(learning_rate)
loss_fn = CategoricalCrossentropy()
model.compile(loss=loss_fn,
optimizer=optimizer,
metrics=categorical_accuracy)
history = model.fit(loader_tr.load(), steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)
and I get the following error:
Traceback (most recent call last):
File "~/PycharmProjects/RL/test_OGB_spektral.py", line 80, in <module>
history = model.fit(loader_tr.load(), steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)
File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\training.py", line 1183, in fit
tmp_logs = self.train_function(iterator)
File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\def_function.py", line 889, in __call__
result = self._call(*args, **kwds)
File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\def_function.py", line 933, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\def_function.py", line 764, in _initialize
*args, **kwds))
File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\function.py", line 3050, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\function.py", line 3444, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\function.py", line 3289, in _create_graph_function
capture_by_value=self._capture_by_value),
File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\func_graph.py", line 999, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\def_function.py", line 672, in wrapped_fn
out = weak_wrapped_fn().__wrapped__(*args, **kwds)
File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\func_graph.py", line 986, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\training.py:855 train_function *
return step_function(self, iterator)
~\AppData\Local\Programs\Python\Python37\lib\site-packages\spektral\models\general_gnn.py:166 call *
x, a, i = inputs
ValueError: too many values to unpack (expected 3)
The problem here is that in the examples you supplied, the models are created based on three values dataset.n_node_features
, dataset.n_edge_features
and dataset.n_labels
. However, I can only dataset.n_labels
as the output to the GeneralGNN
constructor. If that's the case, can you please explain how I can feed those values to GeneralGNN
?
The isssue here is that "molhiv" is a dataset that has edge attributes, but GeneralGNN expects only node attributes (x, a, i).
You can either change dataset or implement a model similar to GeneralGNN which is designed to discard edge attributes. Something like:
class MyGeneralGNN(GeneralGNN):
def call(self, inputs):
x, a, e, i = inputs
return super().call([x,a, i])
Hi Daniele and Jack,
I have a question regarding this example. In particular, running the code yields the following file's content: data.txt.
However, if one plots the data above, say,
there is no convergence and stability in terms of the test accuracy:
Here
test_acc
severely oscillates. Can you please explain how this learning process is considered to be normal while the accuracy of the model does not overall increase from one epoch to another?