Closed Jedges closed 3 months ago
I have some questions about implementation details. QoR is a graph-level prediction task, while HOGA is trained based on nodes. How to ensure that all nodes of a graph are in the same batch? If training is performed for each graph, how to set a unified batch_size? I sincerely hope to get your reply, thanks very much.
I have some questions about implementation details. QoR is a graph-level prediction task, while HOGA is trained based on nodes. How to ensure that all nodes of a graph are in the same batch? If training is performed for each graph, how to set a unified batch_size? I sincerely hope to get your reply, thanks very much.
For my own understanding, you can define the component of GraphReadout found in Figure 2b of the paper by your own. The explanation for this part is in section 2.1. You can choose the pooling layer that you prefer, for example: mean pooling, sum pooling, or attention-based pooling.
The code does not seem to conclude this part, but I think it should be fine. Adding this would only require a few lines of code.
If I have misunderstood anything, please point it out and correct me, maintainer :)
Thank you for your reply.
If I understand correctly, the readout function in Figure 2.1b is used to aggregate the final representation of the node for different hops. It is not a pooling function for the graph.
My question is how this design pattern can be used to train on graph-level tasks. In this case, a significant problem is that the loss function I set for the graph cannot be calculated within the batch divided by nodes. For example, my batch size is 512, but my graph size is 1000.
Not sure if I stated my question clearly.
Hi,
Thank you for your interest! For graph-level tasks, we begin by preprocessing the dataset so that each graph is represented by hop-wise features of all its nodes. These hop-wise features are treated as a single sample for training. During mini-batch training, instead of loading a batch of nodes, we load hop-wise features that correspond to a batch of graphs. In other words, our batch size determines how many graphs to train per batch. In this way, it ensures that all nodes required for computing the loss of each graph are included. Hope this answers your question.
Thanks for your reply. Sorry to bother you again. I'm not sure I understand your meaning completely. Suppose we set the number of hops to be K , the embedding dimension of the node to be N and the batch_size to C, does it mean that the dimension of the data loaded in each batch is C*M*K*N , M represents the maximum number of nodes in these N graphs. Does this mean that we need to do some padding operations? Or maybe you mean we stack the Khop features of C graphs and obtain a (M_1 + ... + M_n)*K*N In this situation, when the num of nodes is particularly large, hundreds of thousands of nodes, will OOM occur?
Yes, we stack the K-hop features to have a 3-order tensor (M_1+...+M_C, K, N) and thus there's no need to do padding.
Regarding your second question, yes, loading a batch of large graphs with millions of nodes may lead to GPU OOM. However, reducing the batch size (or performing distributed training) can help mitigate this issue. It's also worth noting that prior GNN models face similar memory challenges (#nodes x #layers x #dimension), and HOGA does not have a distinct advantage in terms of memory usage compared to prior GNNs for graph-level tasks.
Really appreciated for your answer! This solved my question.
Hello author, when I was reproducing the HOGA, I found that there was no QoR part in the code in the repository. I would like to ask whether this part will be uploaded to github later.