MGitHubL / TMac

MIT License
12 stars 0 forks source link

Doubts about "Temporal GNN Layer" and "Readout" in the code? #1

Open Yuzuriha-Inori-x opened 6 months ago

Yuzuriha-Inori-x commented 6 months ago

Hello! In the paper, the "Temporal GNN Layer" adds the audio information processed by GNN and the audio-visual information processed by GAT. However, in the code provided, I did not find the addition operation. for i, (conv, norm_layer) in enumerate(zip(self.convs, self.Norm_layers)): x_dict, _ = conv(x_dict, edge_index_dict, edge_weights_dict) x_dict = {key: x.relu() for key, x in x_dict.items()} x_dict = {key: F.dropout(x, p=0.1) for key, x in x_dict.items()} x_dict = { key: norm_layer[key](x, batches[key]) for key, x in x_dict.items() } In addition, "Readout" adds the video features and audio features processed by GNN, but in your code, I have not seen the processing operation of the video features. graph_embed = self.graph_read_out_audio(x_dict["audio"], batches["audio"]) # [b,d] # x_dict["audio"]:[3232,512], batches["audio"]:[3232]

# pred = self.lin(torch.sigmoid(graph_embed)) pred = self.lin(graph_embed) Could you do me a favor? Thanks!

MGitHubL commented 6 months ago

In fact, we are still using the classic GNNlayer, but we input the temporal information as weights. You can find the corresponding code in the weight processing part. For the readout operation, our expression is to make it easier for readers to understand. In fact, the video information has already been transmitted to the audio embedding through the GAT layer, so there is no further integration in readout. You can try a weighted fusion of the two embeddings, which may give better results. The changes to the entire code on the original model are relatively rough, and there are still many areas worth improving. If you are interested, you may have some interesting ideas. You are welcome to pay attention.