model architecture - Githubissues

Justin1904 / TensorFusionNetworks

Pytorch Implementation of Tensor Fusion Networks for multimodal sentiment analysis.

169 stars 44 forks source link

Hi Justin, may thanks for the contribution, I read the paper and your implementation, could you answer the following questions:

For each subnet you begin the first layer with dropout- why? y_1 = F.relu(self.linear_1(dropped))
according to Figure 3, in the paper, there are two 128 Relus, so why you implemented a linear layer: y_1 = self.linear_1(h) without defining it as Relu in the forward function, like you did for the Subnet layers, for example: y_2 = F.relu(self.linear_2(y_1))
what is the output of the network? according to the paper, the binary result achieved the best accuracy, so why do you return a scalar value between -3 and 3?

according to the paper, the tensor is 3d, what are you doing here in order to get that?

fusion_tensor = fusion_tensor.view(-1, (self.audio_hidden + 1) * (self.video_hidden + 1), 1)
fusion_tensor = torch.bmm(fusion_tensor, _text_h.unsqueeze(1)).view(batch_size, -1)

I previously referred to the author's code and they used dropout on the input level. I personally didn't test if it'll be better without that.
I got better results with less linear layers after fusion.
TFN model is a regression model, with output being values ranging from -3 to 3 and so are the ground truth labels. The binary accuracy is calculated by first thresholding both the predictions and the ground truth regression values at 0 to divide the real values into 2 classes and then calculate the binary accuracy. That is also how the authors have done it.
The 3D tensor is obtained by taking the outer product over the three input vectors. The same values can be calculated by first taking the outer product between the first two input tensors and then flatten the result (which is a matrix) into a huge vector and take the common vector outer product with the third vector.

Justin1904 / TensorFusionNetworks

model architecture #3