Open linhaojia13 opened 2 years ago
We empirically found that nn.Bilinear(...)
results in a good performance.
I remember that the performance gap is quite big.
(ToDo) I will report the ablation study soon. :zap:
Nonetheless, in my personal opinion, this function can be seen as the 'simplest' version of a conditional MLP.
As you can see the code above, the bilinear function takes the same input x (line 37). Accordingly, we can say that
y = Bilinear(x, x) = (xT W) x = W' x = Linear(x | W')
where W'
is conditioned to the input x.
In short, I think that not that much 'purpose' exists in this function.
However, nn.Bilinear(...)
can be viewed as another type of MLP, as its name implies.
I hope you are satisfied with my understanding.
Thank you for your detailed explanation. I think it is reasonable that regarding the bilinear layer as a simplest implementation of conditional MLP. I'm looking forward to seeing the results of ablation study. Thank you!
What's the purpose of using 'torch.nn.Bilinear'? The formulation of a bilinear transformation is y= x_1^T A x_2 + b, and the formulation of a linear transformation is y=xA^T+b . It seems that a bilinear layer just apply a slighter sophisticated linear transformation than the linear layer?