google-deepmind / graph_nets

Build Graph Nets in Tensorflow
https://arxiv.org/abs/1806.01261
Apache License 2.0
5.36k stars 782 forks source link

Issue with understanding #153

Open Nick97Ohm opened 1 year ago

Nick97Ohm commented 1 year ago

I am currently trying to learn how Graph Neural Networks work, but I am stuck for days with my understanding of this topic. Maybe someone of you can help me out.

I am using Zacharys Karate Club as graph dataset, where it is the goal to perform a node classification to determine which node (person) is loyal to which instructor ( Node 0 or Node 33).

For this purpose I am using the InteractionNetwork module with Linear modules for the node and edge updates. I did assume (and maybe this is where I misunderstood something GNNs) that if I put a sigmoid activation function after the node update, the nodes would have either 0 (loyal to Node 0) or 1 (loyal to Node 1) as values. But I get different double values.

Below is the code that I am using:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tree
from pyvis.network import Network
from graph_nets import blocks
from graph_nets import graphs
from graph_nets import modules
from graph_nets import utils_np
from graph_nets import utils_tf

import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import sonnet as snt
import tensorflow as tf
import functools

# making GraphsTuple from karate club dataset
# get dataset from nx
karate_graph = nx.karate_club_graph()
karate_graph_tupel = karate_graph

# getting node informations
# labeling the nodes
nodes = []
for i in range(1,34):
    if i == 1:
        nodes.append(0)
    if i == 33:
        nodes.append(1)
    else:
        nodes.append(-1)
nodes = np.reshape(nodes, (len(nodes), 1))
nodes_float = tf.cast(nodes, dtype=tf.float64)

# getting sender and receiver informations
sender = []
receiver = []
for tupel in karate_graph.edges:
     sender.append(tupel[0])
     receiver.append(tupel[1])

# getting edge informations
# make graph undirected
directed_edges = karate_graph.edges
undirected_edges = [(u, v) for u, v in directed_edges] + [(v, u) for u, v in directed_edges]

karate_graph.edges = undirected_edges

edges = [[0.0] for _ in range(karate_graph.number_of_edges()*2)]

# create GraphTuple from received informations
data_dict = {
    "nodes": nodes_float,
    "edges": edges,
    "senders": sender,
    "receivers": receiver
}

graphs_tuple = utils_np.data_dicts_to_graphs_tuple([data_dict])
graphs_tuple = tree.map_structure(lambda x: tf.constant(x) if x is not None else None, graphs_tuple)

# defining graph network
graph_network = modules.InteractionNetwork(
    node_model_fn=lambda: snt.Sequential([snt.Linear(output_size=1), tf.nn.sigmoid]),
    edge_model_fn=lambda: snt.Sequential([snt.Linear(output_size=1)])
)

# optimizer and loss function
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)

# learning loop
for epoch in range(50):
    with tf.GradientTape() as tape:

        output_graph = graph_network(graphs_tuple)

        # Loss for labeled nodes
        labeled_nodes = [0, 33]
        labeled_indices = [i for i in labeled_nodes if graphs_tuple.nodes[i] != -1]
        loss = loss_fn(tf.gather(graphs_tuple.nodes, labeled_indices), tf.gather(output_graph.nodes, labeled_indices))

    # calculate gradient
    gradients = tape.gradient(loss, graph_network.trainable_variables)

    # apply gradient
    optimizer.apply_gradients(zip(gradients, graph_network.trainable_variables))

    # Loss output
    print("Epoch %d | Loss: %.4f" % (epoch, loss.numpy()))

print(output_graph.nodes)
print(output_graph.edges)

This is the output that I get:

Loss-funtion: Epoch 0 | Loss: 0.6619 Epoch 1 | Loss: 0.6547 Epoch 2 | Loss: 0.6478 Epoch 3 | Loss: 0.6412 Epoch 4 | Loss: 0.6351 Epoch 5 | Loss: 0.6292 Epoch 6 | Loss: 0.6233 Epoch 7 | Loss: 0.6172 Epoch 8 | Loss: 0.6110 Epoch 9 | Loss: 0.6048 Epoch 10 | Loss: 0.5988 Epoch 11 | Loss: 0.5931 Epoch 12 | Loss: 0.5877 Epoch 13 | Loss: 0.5826 Epoch 14 | Loss: 0.5777 Epoch 15 | Loss: 0.5728 Epoch 16 | Loss: 0.5680 Epoch 17 | Loss: 0.5633 Epoch 18 | Loss: 0.5589 Epoch 19 | Loss: 0.5549

Nodes:

 [[0.09280719]
 [0.04476126]
 [0.03025987]
 [0.13695013]
 [0.34953291]
 [0.26353402]
 [0.26353402]
 [0.26353402]
 [0.22878334]
 [0.47378198]
 [0.34953291]
 [0.54787342]
 [0.44657832]
 [0.22878334]
 [0.47378198]
 [0.47378198]
 [0.41969087]
 [0.44657832]
 [0.47378198]
 [0.40082739]
 [0.47378198]
 [0.44657832]
 [0.47378198]
 [0.21003225]
 [0.32505647]
 [0.32505647]
 [0.47378198]
 [0.28533633]
 [0.37482885]
 [0.28533633]
 [0.28533633]
 [0.16495933]
 [0.01520448]
 [0.83080503]], shape=(34, 1), dtype=float64)

I did not mention the edges, because I dont think that they are relevant for this issue and it would be too much information.

alvarosg commented 1 year ago

Hi! Actually sigmoid is a smooth continuous function, so it is expected that you get floating point numbers.

At train time you can use that floating value as the "mean" parameter of a Bernoulli distribution to maximize log likelihood (which is equivalent to what you are doing when maximizing the BinaryCrossentropy).

At evaluation time, you can either sample from the Bernoully distribution, or you can use the greedy a approach of just rounding to 0 or 1 e.g. sigmoid(output) > 0.5 will return True or False (and then you can cast to an integer to get 0 or 1).

image

Nick97Ohm commented 1 year ago

Thanks a lot, then it was indeed a misunderstanding from my side. I thought sigmoid would act like a step function..,

This raises now another question:

I did change the value of "unlabeled" nodes to 0.5, because with -1 the nodes had an inclination to be classified with label 0. I thought that the value of 0.5 could be interpreted as "not sure if loyal to Node 0 or Node 33", but the results are still staying near the value of 0.5, where sometimes Nodes close to Node 0 are above 0.5 and Nodes close to Node 33 below.

Shouldn't in the first message passing layer at least the neighbors be immediately loyal to the labeled Nodes?

My guess is that, since the other neighbors are also labeled with 0.5 that this affects their results. But how can I fix that, if that's the case?

I really appreciate your help!