rspmm seems to behave differently from the spmm in TorchDrug

hellohawaii commented 1 year ago

Hi, thank you for your codes! However, I find that the rspmm seems to behave differently from the spmm in TorchDrug.

How I found this: I tried to load the checkpoint trained by the official repo to the model in this repo and found that the performance was extremely low. I compared the results in each layer and found that in the function message_and_aggregate of the GeneralizedRelationalConv, the result produced by generalized_rspmm is different from the output produced by functional.generalized_rspmm in the official repo, although I have checked that the input is the same.

How to reproduce In the function message_and_aggregate of the GeneralizedRelationalConv, before the line if self.message_func in self.message2mul:, add:

torch_drug_edge_index = torch.cat([edge_index, edge_type.unsqueeze(0)], dim=0)
torch_drug_edge_weight = edge_weight
adjacency = sparse_coo_tensor(torch_drug_edge_index, torch_drug_edge_weight, size=[num_node, num_node, self.num_relation])
# the adjacency above is the graph.adjacency in the official NBFNet repo, which is produced by https://github.com/DeepGraphLearning/torchdrug/blob/6066fbd82360abb5f270cba1eca560af01b8cc90/torchdrug/data/graph.py#L801
adjacency = adjacency.transpose(0, 1)

and replace the generalized_rspmm below with functional.generalized_rspmm from Torchdrug， just as official repo did (official repo use relation_input, with is the relation in this repo).

also, add necessary import to layers.py.

from torchdrug.layers import functional
from torchdrug.utils import sparse_coo_tensor

hellohawaii commented 1 year ago

Well, I seem to figure this out. If I omit the line adjacency = adjacency.transpose(0, 1) in the code snippet, I can get the same output as the codes in this repo. So it is not the problem of rspmm codes.

Why did this repo not adopt this transpose to the rspmm version? Is this a bug? Or is the code from the official repo wrong?

KiddoZhu commented 10 months ago

Sorry for late response. Thanks for figuring this out! I think this is not a bug, but a compatibility problem. The expressiveness between the original and transposed adjacency matrix is equivalent for propagation, since we always add flipped triplets in the fact graph.

KiddoZhu / NBFNet-PyG

rspmm seems to behave differently from the spmm in TorchDrug #11