gasteigerjo / ppnp

PPNP & APPNP models from "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019)
https://www.daml.in.tum.de/ppnp
MIT License
317 stars 54 forks source link

There have 2 bugs in networkx_to_sparsegraph function #16

Closed Void-JackLee closed 1 year ago

Void-JackLee commented 1 year ago

I'm using torch for training, and I try to convert a dataset to sparsegraph defined in project, it raise me an error with numpy.

/ppnp/pytorch/propagation.py:12, in calc_A_hat(adj_matrix)
     10 nnodes = adj_matrix.shape[0]
     11 A = adj_matrix + sp.eye(nnodes)
---> 12 D_vec = np.sum(A, axis=1).A1
     13 D_vec_invsqrt_corr = 1 [/](https://file+.vscode-resource.vscode-cdn.net/) np.sqrt(D_vec)
     14 D_invsqrt_corr = sp.diags(D_vec_invsqrt_corr)

AttributeError: 'numpy.ndarray' object has no attribute 'A1'

Finally I find that bug in networkx_to_sparsegraph function.

# Extract adjacency matrix
adj_matrix = nx.adjacency_matrix(nx_graph)

The type of adj_matrix privide here was scipy.sparse._arrays.csr_array, not the sparse matrix!

Thus, np.sum(A, axis=1) will produce a numpy.ndarray object which has no attribute 'A1', only numpy.matrix has it. Actually, it's easy to fix it with try...except block. I modify the calc_A_hat(adj_matrix) function in propagation.py as below. 😊

def calc_A_hat(adj_matrix: sp.spmatrix) -> sp.spmatrix:
    nnodes = adj_matrix.shape[0]
    A = adj_matrix + sp.eye(nnodes)
    D_vec = np.sum(A, axis=1)
    try:
        D_vec = D_vec.A1
    except:
        pass
    D_vec_invsqrt_corr = 1 / np.sqrt(D_vec)
    D_invsqrt_corr = sp.diags(D_vec_invsqrt_corr)
    return D_invsqrt_corr @ A @ D_invsqrt_corr

Another bug is the labels. In function train_stopping_split, the for i in range(max(labels) + 1): statement needs a integer labels.

/ppnp/preprocessing.py:32, in train_stopping_split(idx, labels, ntrain_per_class, nstopping, seed)
     30 rnd_state = np.random.RandomState(seed)
     31 train_idx_split = []
---> 32 for i in range(max(labels) + 1):
     33     train_idx_split.append(rnd_state.choice(
     34             idx[labels == i], ntrain_per_class, replace=False))
     35 train_idx = np.concatenate(train_idx_split)

TypeError: 'numpy.float64' object cannot be interpreted as an integer

But actully the type of labels produced by networkx_to_sparsegraph function is float32!

# Convert labels to integers
if labels is None:
    class_names = None
else:
    try:
        labels = np.array(labels, dtype=np.float32)
        class_names = None
    except ValueError:
        class_names = np.unique(labels)
        class_mapping = {k: i for i, k in enumerate(class_names)}
        labels_int = np.empty(nx_graph.number_of_nodes(), dtype=np.float32)
        for inode, label in enumerate(labels):
            labels_int[inode] = class_mapping[label]
        labels = labels_int

Just modify it to integer can fix this problem. 😊

gasteigerjo commented 1 year ago

Thank you for your interest and the very clear description of these issues!

  1. csr_array instead of spmatrix: When I created this repository, csr_array didn't even exist. The function's behavior is actually consistent with the type annotation, which specifies sp.spmatrix as input type. Regardless, I've extended the function to sp.sparray, so it works more smoothly.
  2. This looks like a bug, even if I haven't looked at the code in ages. Fixed. Note that the networkx_to_sparsegraph is just a utility function and is not used in the demo. I hope you found it useful despite the issues!